How to find the missing rows? - sql

I have a table as shown in the image.
The column MONTH_NO should be having months from 1 to 12 for every year. For some years, we missed to load data for some months. I need a query which will fetch the years which doesn't have all the 12 months along with the missing month number.
Please help.

For example -
with mth
as (select level as month_no
from dual
connect by level <= 12),
yrs as (select distinct year from rag_month_dim)
select m.year, m.month_no
from (select year, month_no
from yrs, mth) m,
rag_month_dim r
where m.year = r.year(+)
and m.month_no = r.month_no(+)
group by m.year, m.month_no
having max(r.month_no) is null
order by year, month_no

Try it like this:
post this into an empty query window and adapt to your needs.
MyData contains a "full" year 2013, Sept is missing in 2014 and June and Sept are missing in 2015.
DECLARE #OneToTwelve TABLE(Nmbr INT)
INSERT INTO #OneToTwelve VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12);
DECLARE #myData TABLE(yearNo INT, MonthNo INT)
INSERT INTO #myData VALUES
(2013,1),(2013,2),(2013,3),(2013,4),(2013,5),(2013,6),(2013,7),(2013,8),(2013,9),(2013,10),(2013,11),(2013,12)
,(2014,1),(2014,2),(2014,3),(2014,4),(2014,5),(2014,6),(2014,7),(2014,8),(2014,10),(2014,11),(2014,12)
,(2015,1),(2015,2),(2015,3),(2015,4),(2015,5),(2015,7),(2015,8),(2015,10),(2015,11),(2015,12);
WITH AllYears AS
(
SELECT DISTINCT yearNo FROM #myData
)
,AllCombinations AS
(
SELECT *
FROM #OneToTwelve AS months
CROSS JOIN AllYears
)
SELECT *
FROM AllCombinations
LEFT JOIN #myData AS md ON AllCombinations.Nmbr =md.MonthNo AND AllCombinations.yearNo=md.yearNo
WHERE md.MonthNo IS NULL

select distinct year, m.lev
from rag_month_dim a
join
(
select level lev
from dual
connect by level <= 12
) m
on 1=1
minus
select year, month_no
from rag_month_dim
order by 1, 2

select *
from (select count (-1) total, year from rag_month_dim group by year) as table
where total < 12.
you got a year that doesnt have 12 month data and total month record in your data.

Related

Detect if a month is missing and insert them automatically with a select statement (MSSQL)

I am trying to write a select statement which detects if a month is not existent and automatically inserts that month with a value 0. It should insert all missing months from the first entry to the last entry.
Example:
My table looks like this:
After the statement it should look like this:
You need a recursive CTE to get all the years in the table (and the missing ones if any) and another one to get all the month numbers 1-12.
A CROSS join of these CTEs will be joined with a LEFT join to the table and finally filtered so that rows prior to the first year/month and later of the last year/month are left out:
WITH
limits AS (
SELECT MIN(year) min_year, -- min year in the table
MAX(year) max_year, -- max year in the table
MIN(DATEFROMPARTS(year, monthnum, 1)) min_date, -- min date in the table
MAX(DATEFROMPARTS(year, monthnum, 1)) max_date -- max date in the table
FROM tablename
),
years(year) AS ( -- recursive CTE to get all the years of the table (and the missing ones if any)
SELECT min_year FROM limits
UNION ALL
SELECT year + 1
FROM years
WHERE year < (SELECT max_year FROM limits)
),
months(monthnum) AS ( -- recursive CTE to get all the month numbers 1-12
SELECT 1
UNION ALL
SELECT monthnum + 1
FROM months
WHERE monthnum < 12
)
SELECT y.year, m.monthnum,
DATENAME(MONTH, DATEFROMPARTS(y.year, m.monthnum, 1)) month,
COALESCE(value, 0) value
FROM months m CROSS JOIN years y
LEFT JOIN tablename t
ON t.year = y.year AND t.monthnum = m.monthnum
WHERE DATEFROMPARTS(y.year, m.monthnum, 1)
BETWEEN (SELECT min_date FROM limits) AND (SELECT max_date FROM limits)
ORDER BY y.year, m.monthnum
See the demo.
You should not be storing date components in two separate columns; instead, you should have just one column, with a proper date-like datatype.
One approach is to use a recursive query to generate all starts of month between the earliest and latest date in the table, then brin the table with a left join.
In SQL Server:
with cte as (
select min(datefromparts(year, monthnum, 1)) as dt,
max(datefromparts(year, monthnum, 1)) as dt_max
from mytable
union all
select dateadd(month, 1, dt)
from cte
where dt < dt_max
)
select c.dt, coalesce(t.value, 0) as value
from cte c
left join mytable t on datefromparts(t.year, t.month, 1) = c.dt
If your data spreads over more that 100 months, you need to add option(maxrecursion 0) at the end of the query.
You can extract the date components in the final select if you like:
select
year(c.dt) as yr,
month(c.dt) as monthnum,
datename(month, c.dt) as monthname,
coalesce(t.value, 0) as value
from ...

Calculate profit of successive years by adding profit of previous year

Sample data and expected result is provided in the image:
We need to add the profit of previous year with the successive year and display the data in the format given in he image (sample data is also provided in the image).
Please help me with the SQL query to solve this problem.
You can also write this using the window function.
SELECT
Year,
SUM(Profit) OVER(ORDER BY Year) AS Total_Profit
FROM your_table
ORDER BY Year
This is probably the world's simplest recursive CTE which you could have googled.
But here it is:
declare #years table(y int, p int)
insert #years values (2015,1000),(2016,2000),(2017,500),(2018,1000)
; with cumulative as
(
select top 1 * from #years order by y
union all
select y.y, y.p+c.p
from #years y
join cumulative c on y.y=c.y+1
)
select * from cumulative
Result:
y p
2015 1000
2016 3000
2017 3500
2018 4500
Use Sum over partition :
WITH V1 AS (
SELECT 2015 AS YEAR, 1000 AS PROFIT FROM DUAL
UNION ALL SELECT 2016 AS YEAR, 2000 AS PROFIT FROM DUAL
UNION ALL SELECT 2017 AS YEAR, 500 AS PROFIT FROM DUAL
UNION ALL SELECT 2018 AS YEAR, 1000 AS PROFIT FROM DUAL
)
SELECT
V1.YEAR
, PROFIT --- You can comment it if not needed
, SUM(PROFIT) OVER (PARTITION BY 1 ORDER BY YEAR RANGE UNBOUNDED PRECEDING) AS PROFIT_CUM
FROM V1;

get list of student with attendance min15days in a month and come for continuous 4 months in a year

I need a query to get the list of students attended there class for atleast 15 days in a month for continuous 4 months.
table maybe like
studentid monthyear attendance
1 Apr2018 16
1 May2018 23
1 Jun2018 18
1 Jul2018 16
1 Aug2018 25
2 Apr2018 2
2 May2018 15
and so on...
Db fiddle
Try this query:
select #rn := 0;
select studentid from (
select studentid, month(dt) - (#rn := #rn + 1) grp from (
select * ,
str_to_date(concat('01 ', insert(monthyear, 4, 0, ' ')), '%d %M %Y') dt
from tbl
where attendance >= 15 --only those records, where attenadnce is at least 15
) a where year(dt) = 2018 --particular year
order by studentid,dt
) a group by studentid,grp having count(*) >= 4
Demo - I exapnded your data with some more cases :)
The idea is simple - if student has attended for some consecutive months, consecutive months would increment by one, just like row number, so I used difference between months and row numbers - for consecutive months, the difference should be constant, so it's enought to group by that difference and take those groups, where count is >= 4 :)
UPDATE
For SQL Server:
select studentid from (
select studentid, month(dt) - row_number() over (order by studentid, dt) grp from (
select * ,
cast(concat('01 ', stuff(monthyear, 4, 0, ' ')) as date) dt
from tbl
where attendance >= 15 --only those records, where attenadnce is at least 15
) a where year(dt) = 2018 --particular year
) a group by studentid, grp having count(*) >= 4
SQL Server demo
In general, a simple selft join that would catch the difference of months would suffice
In this case, a conversion of the column monthyear is required in the join command itself
The query, without the conversion :
SELECT t1.studentid, count(*) as cnt
FROM
table t1
INNER JOIN table t2 ON t1.studentid = t2.studentid AND
t2.attendance >= 15
AND t1.monthyear BETWEEN t2.monthyear AND (t2.monthyear - 3)
WHERE
t1.attendance >= 15
GROUP BY
studentid
HAVING
count(*) >=4
The conversion is as follows:
STR_TO_DATE(
CONCAT(SUBSTR(t1.monthyear,1, LENGTH(t1.monthyear) - 4),' ', RIGHT(t1.monthyear, 4), %M %Y)
so the query should be:
SELECT t1.studentid, count(*) as cnt
FROM
table t1
INNER JOIN table t2 ON t1.studentid = t2.studentid AND
t2.attendance >= 15
AND STR_TO_DATE(
CONCAT(SUBSTR(t1.monthyear,1, LENGTH(t1.monthyear) - 4),' ', RIGHT(t1.monthyear, 4), %M %Y) BETWEEN STR_TO_DATE(
CONCAT(SUBSTR(t2.monthyear,1, LENGTH(t2.monthyear) - 4),' ', RIGHT(t2.monthyear, 4), %M %Y) AND DATE_SUB(STR_TO_DATE(
CONCAT(SUBSTR(t2.monthyear,1, LENGTH(t2.monthyear) - 4),' ', RIGHT(t2.monthyear, 4), %M %Y), INTERVAL 3 MONTH)
WHERE
t1.attendance >= 15
GROUP BY
studentid
HAVING
count(*) >=4
I think this is the simplest method:
select distinct studentid
from (select t.*, cast(monthyear as date) as my,
lag(cast(monthyear as date), 3) over (partition by studentid order by cast(monthyear as date)) as prev_my
from tbl t
where attendance >= 15
) t
where prev_my = dateadd(month, -3, my);
Here is a db<>fiddle.
The logic is pretty simple:
Only consider rows that satisfy the attendance criterion.
Use LAG() to look at the 3rd record in past.
If all months meet the attendance criterion, then this will be exactly 3 months before.
The select distinct is because you want students, not the specific periods.

Get valid orders at the starting day of each year

In a table containing Order information (call it Order) we have the following fields:
OrderId int
OrderDate Date
BindingTime int
Binding time is in months.
An order is called "Active" between its OrderDate and DATEADD(mm, BindingTime, OrderDate).
What I'd like to do is to group the orders by year so that if an order is "active" on the first day of a year it would be taken into account. The aim is to calculate each year's inbound and outbound orders. So the query result will be COUNT of orders and the year. And by year we mean the number of orders which were active on the first day of that year.
Mind that, we would like to have all the years between two given numbers in our results. E.g. If there was no active order on the first day of 2016 we would still like to to have a row for (0, 2016).
I've used a recursive CTE to generate a range of years, so that a 'zero' year will not be omitted
declare #YEAR1 as date = '20110101';
declare #YEAR2 as date = '20190101';
WITH YEARS AS (SELECT #YEAR1 y
UNION ALL
SELECT dateadd(year,1,y) FROM YEARS WHERE y < #YEAR2)
SELECT YEARS.y,count(0) YearStartActiveOrders FROM YourTable
CROSS JOIN YEARS
WHERE YEARS.y BETWEEN CAST(orderdate as date)
AND CAST(DATEADD(mm, BindingTime, OrderDate) as date)
GROUP BY Years.y
Seems like what you need is a Date table (having a list of all days per year) and left joining that table with your grouped data (active order count, per day of the year).
You can use this date table from Aaron Bertrand. I generated the #dim table with the following params, to only generate two years data (2015, 2016):
DECLARE #StartDate DATE = '20150101', #NumberOfYears INT = 2;
Then you can do the following:
with ordertable as
(
select 1 as orderid, '20160101' as orderdate, 2 as bindingtime union all
select 2, '20160305', 3 union all
select 3, '20160305', 5 union all
select 4, '20150305', 5
)
select d.year, isnull(count(orderid), 0) nrActiveOrdersFirstDayOfYear
from #dim d
left join ordertable g on d.year = year(g.orderdate)
and g.orderdate = d.date
and d.FirstOfYear between g.orderdate and DATEADD(mm, g.bindingtime, OrderDate)
group by d.year
With the sample data I took as an example, you would get the result:
year nrActiveOrdersFirstDayOfYear
2015 0
2016 1
Working demo here.

sql server rolling 12 months sum with date gaps

Suppose I have a table that indicates the number of items sold in a particular month for each sales rep. However, there will not be a row for a particular person in months where there were no sales. Example
rep_id month_yr num_sales
1 01/01/2012 3
1 05/01/2012 1
1 11/01/2012 1
2 02/01/2012 2
2 05/01/2012 1
I want to be able to create a query that shows for each rep_id and all possible months (01/01/2012, 02/01/2012, etc. through current) a rolling 12 month sales sum, like this:
rep_id month_yr R12_Sum
1 11/01/2012 5
1 12/01/2012 5
1 01/01/2013 5
1 02/01/2013 2
I have found some examples online, but the problem I'm running into is I'm missing some dates for each rep_id. Do I need to cross join or something?
To solve this problem, you need a driver table that has all year/month combinations. Then, you need to create this for each rep.
The solution is then to left join the actual data to this driver and aggregate the period that you want. Here is the query:
with months as (
select 1 as mon union all select 2 union all select 3 union all select 4 union all
select 5 as mon union all select 6 union all select 7 union all select 8 union all
select 9 as mon union all select 10 union all select 11 union all select 12
),
years as (select 2010 as yr union all select 2011 union all select 2012 union all select 2013
),
monthyears as (
select yr, mon, yr*12+mon as yrmon
from months cross join years
),
rmy as (
select *
from monthyears my cross join
(select distinct rep_id from t
) r
)
select rmy.rep_id, rmy.yr, rmy.mon, SUM(t.num_sales) as r12_sum
from rmy join
t
on rmy.rep_id = t.rep_id and
t.year(month_yr)*12 + month(month_yr) between rmy.yrmon - 11 and rmy.yrmon
group by rmy.rep_id, rmy.yr, rmy.mon
order by 1, 2, 3
This hasn't been tested, so it may have syntactic errors. Also, it doesn't convert the year/month combination back to a date, leaving the values in separate columns.
Here is one solution:
SELECT
a.rep_id
,a.month_yr
,SUM(b.R12_Sum) AS R12_TTM
FROM YourTable a
LEFT OUTER JOIN YourTable b
ON a.rep_id = b.rep_id
AND a.month_yr <= b.month_yr
AND a.month_yr >= DATEADD(MONTH, -11, b.month_yr)
GROUP BY
a.rep_id
,a.month_yr
It's certainly not pretty but is more simple than a CTE, numbers table or self join:
DECLARE #startdt DATETIME
SET #startdt = '2012-01-01'
SELECT rep_id, YEAR(month_yr), MONTH(month_yr), SUM(num_sales)
FROM MyTable WHERE month_yr >= #startdt AND month_yr < DATEADD(MONTH,1,#startdt)
UNION ALL
SELECT rep_id, YEAR(month_yr), MONTH(month_yr), SUM(num_sales)
FROM MyTable WHERE month_yr >= DATEADD(MONTH,1,#startdt) AND month_yr < DATEADD(MONTH,2,#startdt)
UNION ALL
SELECT rep_id, YEAR(month_yr), MONTH(month_yr), SUM(num_sales)
FROM MyTable WHERE month_yr >= DATEADD(MONTH,2,#startdt) AND month_yr < DATEADD(MONTH,3,#startdt)
UNION ALL
SELECT rep_id, YEAR(month_yr), MONTH(month_yr), SUM(num_sales)
FROM MyTable WHERE month_yr >= DATEADD(MONTH,3,#startdt) AND month_yr < DATEADD(MONTH,4,#startdt)
UNION ALL
etc etc
The following demonstrates using a CTE to generate a table of dates and generating a summary report using the CTE. Sales representatives are omitted from the results when they have had no applicable sales.
Try jiggling the reporting parameters, e.g. setting #RollingMonths to 1, for more entertainment.
-- Sample data.
declare #Sales as Table ( rep_id Int, month_yr Date, num_sales Int );
insert into #Sales ( rep_id, month_yr, num_sales ) values
( 1, '01/01/2012', 3 ),
( 1, '05/01/2012', 1 ),
( 1, '11/01/2012', 1 ),
( 2, '02/01/2012', 1 ),
( 2, '05/01/2012', 2 );
select * from #Sales;
-- Reporting parameters.
declare #ReportEnd as Date = DateAdd( day, 1 - Day( GetDate() ), GetDate() ); -- The first of the current month.
declare #ReportMonths as Int = 6; -- Number of months to report.
declare #RollingMonths as Int = 12; -- Number of months in rolling sums.
-- Report.
-- A CTE generates a table of month/year combinations covering the desired reporting time period.
with ReportingIntervals as (
select DateAdd( month, 1 - #ReportMonths, #ReportEnd ) as ReportingInterval,
DateAdd( month, 1 - #RollingMonths, DateAdd( month, 1 - #ReportMonths, #ReportEnd ) ) as FirstRollingMonth
union all
select DateAdd( month, 1, ReportingInterval ), DateAdd( month, 1, FirstRollingMonth )
from ReportingIntervals
where ReportingInterval < #ReportEnd )
-- Join the CTE with the sample data and summarize.
select RI.ReportingInterval, S.rep_id, Sum( S.num_sales ) as R12_Sum
from ReportingIntervals as RI left outer join
#Sales as S on RI.FirstRollingMonth <= S.month_yr and S.month_yr <= RI.ReportingInterval
group by RI.ReportingInterval, S.rep_id
order by RI.ReportingInterval, S.rep_id