Related
My table looks like that:
ID | Start | End
1 | 2010-01-02 | 2010-01-04
1 | 2010-01-22 | 2010-01-24
1 | 2011-01-31 | 2011-02-02
2 | 2012-05-02 | 2012-05-08
3 | 2013-01-02 | 2013-01-03
4 | 2010-09-15 | 2010-09-20
4 | 2010-09-30 | 2010-10-05
I'm looking for a way to count the number of occurrences for each ID in a Year per Month.
But what is important, If some record has a Start date in the following month compared to the End date (of course from the same year) then occurrence should be counted for both months [e.g. ID 1 in the 3rd row has a situation like that. So in this situation, the occurrence for this ID should be +1 for January and +1 for February].
So I'd like to have it in this way:
Year | Month | Id | Occurrence
2010 | 01 | 1 | 2
2010 | 09 | 4 | 2
2010 | 10 | 4 | 1
2011 | 01 | 1 | 1
2011 | 02 | 1 | 1
2012 | 05 | 2 | 1
2013 | 01 | 3 | 1
I created only this for now...
CREATE TABLE IF NOT EXISTS counts AS
(SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source)
And I don't know how to move with that further. I'd appreciate your help.
I'm using Spark SQL.
Try the following strategy to achieve this:
Note:
I have created few intermediate tables. If you wish you can use sub-query or CTE depending on the permissions
I have taken care of 2 scenarios you mentioned (whether to count it as 1 occurrence or 2 occurrence) as you explained
Query:
Firstly, creating a table with flags to decide whether start and end date are falling on same year and month (1 means YES, 2 means NO):
/* Creating a table with flags whether to count the occurrences once or twice */
CREATE TABLE flagged as
(
SELECT *,
CASE
WHEN Year_st = Year_end and Month_st = Month_end then 1
WHEN Year_st = Year_end and Month_st <> Month_end then 2
Else 0
end as flag
FROM
(
SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source
) as calc
)
Now the flag in the above table will have 1 if year and month are same for start and end 2 if month differs. You can have more categories of flag if you have more scenarios.
Secondly, counting the occurrences for flag 1. As we know year and month are same for flag 1, we can take either of it. I have taken start:
/* Counting occurrences only for flag 1 */
CREATE TABLE flg1 as (
SELECT distinct id, year_st, month_st, count(*) as occurrence
FROM flagged
where flag=1
GROUP BY id, year_st, month_st
)
Similarly, counting the occurrences for flag 2. Since month differs for both the dates, we can UNION them before counting to get both the dates in same column:
/* Counting occurrences only for flag 2 */
CREATE TABLE flg2 as
(
SELECT distinct id, year_dt, month_dt, count(*) as occurrence
FROM
(
select ID, year_st as year_dt, month_st as month_dt FROM flagged where flag=2
UNION
SELECT ID, year_end as year_dt, month_end as month_dt FROM flagged where flag=2
) as unioned
GROUP BY id, year_dt, month_dt
)
Finally, we just have to SUM the occurrences from both the flags. Note that we use UNION ALL here to combine both the tables. This is very important because we need to count duplicates as well:
/* UNIONING both the final tables and summing the occurrences */
SELECT distinct year, month, id, SUM(occurrence) as occurrence
FROM
(
SELECT distinct id, year_st as year, month_st as month, occurrence
FROM flg1
UNION ALL
SELECT distinct id, year_dt as year, month_dt as month, occurrence
FROM flg2
) as fin_unioned
GROUP BY id, year, month
ORDER BY year, month, id, occurrence desc
Output of above query will be your expected output. I know this is not an optimized one, yet it works perfect. I will update if I come across optimized strategy. Comment if you have question.
db<>fiddle link here
Not sure if this works in Spark SQL.
But if the ranges aren't bigger than 1 month, then just add the extra to the count via a UNION ALL.
And the extra are those with the end in a higher month than the start.
SELECT YearOcc, MonthOcc, Id
, COUNT(*) as Occurrence
FROM
(
SELECT Id
, YEAR(CAST(Start AS DATE)) as YearOcc
, MONTH(CAST(Start AS DATE)) as MonthOcc
FROM source
UNION ALL
SELECT Id
, YEAR(CAST(End AS DATE)) as YearOcc
, MONTH(CAST(End AS DATE)) as MonthOcc
FROM source
WHERE MONTH(CAST(Start AS DATE)) < MONTH(CAST(End AS DATE))
) q
GROUP BY YearOcc, MonthOcc, Id
ORDER BY YearOcc, MonthOcc, Id
YearOcc | MonthOcc | Id | Occurrence
------: | -------: | -: | ---------:
2010 | 1 | 1 | 2
2010 | 9 | 4 | 2
2010 | 10 | 4 | 1
2011 | 1 | 1 | 1
2011 | 2 | 1 | 1
2012 | 5 | 2 | 1
2013 | 1 | 3 | 1
db<>fiddle here
I have a table like this one:
Yr | Mnth | W_ID | X_ID | Y_ID | Z_ID | Purchases | Sales | Returns |
2015 | 10 | 1 | 5210 | 1402 | 2 | 1000.00 | etc | etc |
2015 | 12 | 1 | 5210 | 1402 | 2 | 12000.00 | etc | etc |
2016 | 1 | 1 | 5210 | 1402 | 2 | 1000.00 | etc | etc |
2016 | 3 | 1 | 5210 | 1402 | 2 | etc | etc | etc |
2014 | 3 | 9 | 880 | 2 | 7 | etc | etc | etc |
2014 | 12 | 9 | 880 | 2 | 7 | etc | etc | etc |
2015 | 5 | 9 | 880 | 2 | 7 | etc | etc | etc |
2015 | 7 | 9 | 880 | 2 | 7 | etc | etc | etc |
For each combination of (W, X, Y, Z) I would like to insert the months that don't appear in the table and are between the first and last month.
In this example, for combination (W=1, X=5210, Y=1402, Z=2), I would like to have additional rows for 2015/11 and 2016/02, where Purchases, Sales and Returns are NULL. For combination (W=9, X=880, Y=2, Z=7) I would like to have additional rows for months between 2014/4 and 2014/11, 2015/01 and 2015/04, 2016/06.
I hope I have explained myself correctly.
Thank you in advance for any help you can provide.
The process is rather cumbersome in this case, but quite possible. One method uses a recursive CTE. Another uses a numbers table. I'm going to use the latter.
The idea is:
Find the minimum and maximum values for the year/month combination for each set of ids. For this, the values will be turned into months since time 0 using the formula year*12 + month.
Generate a bunch of numbers.
Generate all rows between the two values for each combination of ids.
For each generated row, use arithmetic to re-extract the year and month.
Use left join to bring in the original data.
The query looks like:
with n as (
select row_number() over (order by (select null)) - 1 as n -- start at 0
from master.spt_values
),
minmax as (
select w_id, x_id, y_id, z_id, min(yr*12 + mnth) as minyyyymm,
max(yr*12 + mnth) as maxyyyymm
from t
group by w_id, x_id, y_id, z_id
),
wxyz as (
select minmax.*, minmax.minyyyymm + n.n,
(minmax.minyyyymm + n.n) / 12 as yyyy,
((minmax.minyyyymm + n.n) % 12) + 1 as mm
from minmax join
n
on minmax.minyyyymm + n.n <= minmax.maxyyyymm
)
select wxyz.yyyy, wxyz.mm, wxyz.w_id, wxyz.x_id, wxyz.y_id, wxyz.z_id,
<columns from t here>
from wxyz left join
t
on wxyz.w_id = t.w_id and wxyz.x_id = t.x_id and wxyz.y_id = t.y_id and
wxyz.z_id = t.z_id and wxyz.yyyy = t.yr and wxyz.mm = t.mnth;
Thank you for your help.
Your solution works, but I noticed it is not very good in terms of performance, but meanwhile I have managed to get a solution for my problem.
DECLARE #start_date DATE, #end_date DATE;
SET #start_date = (SELECT MIN(EOMONTH(DATEFROMPARTS(Yr , Mnth, 1))) FROM Table_Input);
SET #end_date = (SELECT MAX(EOMONTH(DATEFROMPARTS(Yr , Mnth, 1))) FROM Table_Input);
DECLARE #tdates TABLE (Period DATE, Yr INT, Mnth INT);
WHILE #start_date <= #end_date
BEGIN
INSERT INTO #tdates(PEriod, Yr, Mnth) VALUES(#start_date, YEAR(#start_date), MONTH(#start_date));
SET #start_date = EOMONTH(DATEADD(mm,1,DATEFROMPARTS(YEAR(#start_date), MONTH(#start_date), 1)));
END
DECLARE #pks TABLE (W_ID NVARCHAR(50), X_ID NVARCHAR(50)
, Y_ID NVARCHAR(50), Z_ID NVARCHAR(50)
, PerMin DATE, PerMax DATE);
INSERT INTO #pks (W_ID, X_ID, Y_ID, Z_ID, PerMin, PerMax)
SELECT W_ID, X_ID, Y_ID, Z_ID
, MIN(EOMONTH(DATEFROMPARTS(Ano, Mes, 1))) AS PerMin
, MAX(EOMONTH(DATEFROMPARTS(Ano, Mes, 1))) AS PerMax
FROM Table1
GROUP BY W_ID, X_ID, Y_ID, Z_ID;
INSERT INTO Table_Output(W_ID, X_ID, Y_ID, Z_ID
, ComprasLiquidas, RTV, DevManuais, ComprasBrutas, Vendas, Stock, ReceitasComerciais)
SELECT TP.DB, TP.Ano, TP.Mes, TP.Supplier_Code, TP.Depart_Code, TP.BizUnit_Code
, TA.ComprasLiquidas, TA.RTV, TA.DevManuais, TA.ComprasBrutas, TA.Vendas, TA.Stock, TA.ReceitasComerciais
FROM
(
SELECT W_ID, X_ID, Y_ID, Z_ID
FROM #tdatas CROSS JOIN #pks
WHERE Period BETWEEN PerMin And PerMax
) AS TP
LEFT JOIN Table_Input AS TA
ON TP.W_ID = TA.W_ID AND TP.X_ID = TA.X_ID AND TP.Y_ID = TA.Y_ID
AND TP.Z_ID = TA.Z_ID
AND TP.Yr = TA.Yr
AND TP.Mnth = TA.Mnth
ORDER BY TP.W_ID, TP.X_ID, TP.Y_ID, TP.Z_ID, TP.Yr, TP.Mnth;
I do the following:
Get the Min and Max date of the entire table - #start_date and #end_date variables;
Create an auxiliary table with all dates between Min and Max - #tdates table;
Get all the combinations of (W_ID, X_ID, Y_ID, Z_ID) along with the min and max dates of that combination - #pks table;
Create the cartesian product between #tdates and #pks, and in the WHERE clause I filter the results between the Min and Max of the combination;
Compute a LEFT JOIN of the cartesian product table with the input data table.
2i ll listed only 1 table that i need to query :
lodgings_Contract :
id_contract indentity primary,
id_person int,
id_room varchar(4),
day_begin datetime,
day_end datetime,
day_register datetime
money_per_month money
And this is values for table lodgings_Contract (This datas used for Example only):
id_contract | id_person | id_room | day_begin -----| day_end ----- | day_register------- | money_per_month
3 | 2 | 101 | 1/12/2014 | 27/2/2015 | 1/12/2015 | 100
2 | 1 | 102 | 1/1/2014 | 27/4/2014 | 1/1/2014 | 200
1 | 3 | 103 | 1/1/2014 | 27/3/2014 | 1/1/2014 | 300
*person 1 rent room 102 in 4 month at year 2014 with 200/month And person 2 rent room 101 in 3 month but 1 month at year 2014 and 2 month at year 2015 with 100/month .Person 3 rent room 103 in 3 month at year 2014 with 300/month
I want my result display 3 field : Month | Year | Incomes
Result :
Month | Year | Incomes
1 |2014| 500
2 |2014| 500
3 |2014| 500
4 |2014| 200
12 |2014| 100
1 |2015| 100
2 |2015| 100
Can i do that ? Help me Please !
I was post another post before this post but it complicated and requires 3 tables so i make this post with only 1 table.
This is my code :
select month(day_begin)as 'Month',year(day_begin)as 'Year',money_per_month as 'Incomes'
from lodgings_Contract
group by day_begi,money_per_month
It only listed first month of "day_begin".I have no idea how to do it right
To get the results you first need a calendar table, in the following query is created on the fly with a CTE.
That said what is the purpose of the column day_register? It seems a copy of day_begin, with probably a typo for the contract with ID 3.
WITH Months(N) AS (
SELECT 1 UNION ALL Select 2 UNION ALL Select 3 UNION ALL Select 4
UNION ALL Select 5 UNION ALL Select 6 UNION ALL Select 7 UNION ALL Select 8
UNION ALL Select 9 UNION ALL Select 10 UNION ALL Select 11 UNION ALL Select 12
), Calendar(N) As (
SELECT CAST(2010 + y.N AS VARCHAR) + RIGHT('00' + Cast(m.N AS VARCHAR), 2)
FROM Months m
CROSS JOIN Months y
)
SELECT RIGHT(c.N, 2) [Month]
, LEFT(c.N, 4) [Year]
, SUM(money_per_month) Incomes
FROM lodgings_Contract lc
INNER JOIN Calendar c
ON c.N BETWEEN CONVERT(VARCHAR(6), lc.day_begin, 112)
AND CONVERT(VARCHAR(6), lc.day_end, 112)
GROUP BY c.N
The calendar CTE is small as it's unknown to me for how many year is the real data. If there are many years it is better to create a calendar table in your DB and use it instead of calculate it every time.
The calendar CTE return a list of month in the format yyyyMM.
In the main query the CONVERT(VARCHAR(6), lc.day_begin, 112) change the day_begin to the ISO format yyyyMMdd and take only the first six value, so again yyyyMM, for example for the id_contract 3 we will have 201412, the same for the day_end.
If the beginning of the contract is day_register change lc.day_begin to lc.day_register.
SQLFiddle demo
I have a table that I need to use to build a result set from where certain rows from the table are columns in the result set. I started to chain LEFT JOINs together on the table multiple times but I need to eliminate results that are a different combination of another result already in the set:
For example, if I get 1, 21, 25 as result columns, I can't have ANY other combination of those numbers in the results.
My table definition is:
Table tblKPIDetails
Column Month int
Column Year int
Column Division varchar(3)
Column KPI int
Column Value decimal(18,4)
My current query is:
SELECT *
FROM tblKPIDetails J1
LEFT JOIN tblKPIDetails J2 ON J2.Month = J1.Month AND J2.Year = J1.Year AND J2.Division = J1.Division AND NOT(J2.KPI = J1.KPI ) AND (J2.KPI = 1 OR J2.KPI = 21 OR J2.KPI = 25)
LEFT JOIN tblKPIDetails J3 ON J3.Month = J1.Month AND J3.Year = J1.Year AND J3.Division = J1.Division AND NOT(J3.KPI = J1.KPI ) AND (J3.KPI = 1 OR J3.KPI = 21 OR J3.KPI = 25)
WHERE J1.KPI = 1 OR J1.KPI = 21 OR J1.KPI = 25
I know this is wrong, but it's a super-set of what I need. In the results from the query above, I can get J1.KPI, J2.KPI, J3.KPI or J1.KPI, J3.KPI, J2.KPI, or any other combination.
My expected result would be:
Division | Month | Year | KPIA | KPIAValue | KPIB | KPIBValue | KPIC | KPICValue
for each division, month, and year
where KPIA, KPIB, or KPIC = 1, 21, or 25 but only 1 combination of 1,21,25 exists per division|month|year
EDIT
To clarify the expected results a little more, using the above query, I'm getting the following results:
Division | Month | Year | KPIA | KPIAValue | KPIB | KPIBValue | KPIC | KPICValue
--------------------------------------------------------------------------------
000 1 2012 1 1000 21 2000 25 3000
000 1 2012 21 2000 1 1000 25 3000
000 1 2012 25 3000 21 2000 1 1000
111 1 2012 1 555 21 10000 25 5000
I need to make it so my results would only be ANY 1 of the first 3 results and then the last one...for example:
Division | Month | Year | KPIA | KPIAValue | KPIB | KPIBValue | KPIC | KPICValue
--------------------------------------------------------------------------------
000 1 2012 25 3000 21 2000 1 1000
111 1 2012 1 555 21 10000 25 5000
I think you are looking for the PIVOT table operator like so:
SELECT
Devision,
Month,
Year,
[1] AS KPIAValue,
[21] AS KPIBValue,
[25] AS KPICValue
FROM
(
SELECT t1.*
FROM tblKPIDetails t1
INNER JOIN
(
SELECT Month, Year, Devision
FROM tblKPIDetails
WHERE KPI IN(1, 21, 25)
GROUP BY Month, Year, Devision
HAVING COUNT(DISTINCT KPI) = 3
) t2 ON t1.Month = t2.Month AND t1.Year = t2.Year
AND t1.Devision = t2.Devision
) t
PIVOT
(
MAX(Value)
FOR KPI IN([1], [21], [25])) p;
SQL Fiddle Demo
This will give you the data in the form:
| DEVISION | MONTH | YEAR | KPIAVALUE | KPIBVALUE | KPICVALUE |
---------------------------------------------------------------
| A | 2 | 2012 | 16 | 16 | 16 |
| B | 10 | 2012 | 16 | 18 | 20 |
Note that: This will give you the only combination of the Year, Month, DEVISION that have all the values 1, 21 and 25, and that what this query do:
SELECT Month, Year, Devision
FROM tblKPIDetails
WHERE KPI IN(1, 21, 25)
GROUP BY Month, Year, Devision
HAVING COUNT(DISTINCT KPI) = 3
Update: If you are looking for those that had at least one of 1, 21 or 25, just remove the HAVING COUNT(DISTINCT KPI) = 3, but this will make you expect more values than these three, in this case it will ignore other values and return only those three. Also it will return NULL for any of the missing values of them like so:
SELECT
Devision,
Month,
Year,
[1] AS KPIAValue,
[21] AS KPIBValue,
[25] AS KPICValue
FROM
(
SELECT t1.*
FROM tblKPIDetails t1
INNER JOIN
(
SELECT Month, Year, Devision
FROM tblKPIDetails
WHERE KPI IN(1, 21, 25)
GROUP BY Month, Year, Devision
) t2 ON t1.Month = t2.Month AND t1.Year = t2.Year
AND t1.Devision = t2.Devision
) t
PIVOT
(
MAX(Value)
FOR KPI IN([1], [21], [25])) p;
Updated SQL Fiddle Demo
| DIVISION | MONTH | YEAR | KPIAVALUE | KPIBVALUE | KPICVALUE |
---------------------------------------------------------------
| A | 2 | 2012 | 15.5 | 15.5 | 15.5 |
| B | 10 | 2012 | 15.5 | 17.5 | 20.24 |
| C | 12 | 2012 | 15.5 | (null) | 20.24 |
If you don't have a large number of "IDs", you could just transpose the values like this:
select
[Month],
[Year],
Division,
sum(case when KPI = 1 then Value else null end) as KPI1,
sum(case when KPI = 21 then Value else null end) as KPI21,
sum(case when KPI = 25 then Value else null end) as KPI25
from tblKPIDetails
group by
[Month],
[Year],
Division
order by
[Month],
[Year],
Division
Or same thing by using the "OVER" clause.
I think you want a conditional aggregation. But it is still not clear to me how the results are being defined. This might help you on your way:
SELECT Division, Month, Year,
1, max(case when kpi = 1 then value end) as kpi1value,
21, max(case when kpi = 21 then value end) as kpi21value,
25, max(case when kpi = 25 then value end) as kpi25value,
FROM tblKPIDetails J1
maybe you can try the following:
SELECT DISTINCT
t.Division,
t.Month,
t.Year,
KA.Value AS KPIAValue,
KB.Value AS KPIBValue,
KC.Value AS KPICValue
FROM
tblKPIDetails t
LEFT JOIN tblKPIDetails KA ON t.Division = KA.Division and t.Month = KA.month and .year = KA.year and KA.KPI = 1
LEFT JOIN tblKPIDetails KB ON t.Division = KB.Division and t.Month = KB.month and t.year = KB.year and KB.KPI = 21
LEFT JOIN tblKPIDetails KC ON t.Division = KC.Division and t.Month = KC.month and t.year = KC.year and KC.KPI = 25
Then is one LEFT JOIN for each possible KPI value you want.
I have a table with week ranges (week number,start date, end date) and a table with tutorial dates (for writing tutors (tutor ID, tutorial_date, tutorial type(A or B).
I want to create two query that shows the week ranges (week 1, week 2) across the top with the tutor names on the side with count of tutorials (of type "A") in that week's date range in each block for that week.
The result should look like this:
Counts of Tutorials of Type "A"
Tutor|Week One|Week Two|Week Three|Week Four|Total
Joe | 3 | 5 | 7 | 8 | 23
Sam | 2 | 4 | 3 | 8 | 17
Meaning that Joe completed 3 tutorials in week one, five in week two, 7 in week three, and 8 in week 4.
The second query should show totals for tutorial type "A" and type "B"
Tutor|Week One|Week Two|Week Three|Week Four|Total |
Joe | 3/1 | 5/3 | 7/2 | 8/2 | 23/8 |
Sam | 2/3 | 4/4 | 3/2 | 8/3 | 17/12 |
Here, in Week One, Joe has done 3 tutorials of type A and 1 of type B.
Sample table data for tutorials (week one)
Tutor | Tutorial_ID | Tutorial Date |Type|
------------------------------------------
Joe | 1 | 2011-01-01 | A |
Joe | 2 | 2011-01-02 | A |
Joe | 3 | 2011-01-03 | A |
Joe | 4 | 2011-01-03 | B |
Sam | 5 | 2011-01-01 | A |
Sam | 6 | 2011-01-02 | A |
Sam | 7 | 2011-01-03 | B |
The week table looks like this:
weekNumber |startDate |endDate
1 |2011-01-01|2011-01-15
I'd like to gen this in SQL Server 2005
There are a few ways to do this.
For query one, where you only need to PIVOT on type 'A' then you can do just a PIVOT
select *
from
(
select w1.tutor
, w1.type
, wk.weeknumber
from w1
inner join wk
on w1.tutorialdate between wk.startdate and wk.enddate
where w1.type = 'a'
) x
pivot
(
count(type)
for weeknumber in ([1])
)p
See SQL Fiddle with Demo
Or you can use a Count() with a CASE statement.
select w1.tutor
, COUNT(CASE WHEN w1.type = 'A' THEN 1 ELSE null END) [Week One]
from w1
inner join wk
on w1.tutorialdate between wk.startdate and wk.enddate
group by w1.tutor
See SQL Fiddle with Demo
But for the second query, I would just use a Count() with a CASE
select w1.tutor
, Cast(COUNT(CASE WHEN w1.type = 'A' AND wk.weeknumber = 1 THEN 1 ELSE null END) as varchar(10))
+ ' / '
+ Cast(COUNT(CASE WHEN w1.type = 'B' AND wk.weeknumber = 1 THEN 1 ELSE null END) as varchar(10)) [Week One]
, Cast(COUNT(CASE WHEN w1.type = 'A' AND wk.weeknumber = 2 THEN 1 ELSE null END) as varchar(10))
+ ' / '
+ Cast(COUNT(CASE WHEN w1.type = 'B' AND wk.weeknumber = 2 THEN 1 ELSE null END) as varchar(10)) [Week Two]
from w1
inner join wk
on w1.tutorialdate between wk.startdate and wk.enddate
group by w1.tutor
See SQL Fiddle with Demo
Edit as AndriyM pointed out the second could be done with a PIVOT here is a solution for the Second query:
SELECT *
FROM
(
select distinct w1.tutor
, wk.weeknumber
, left(total, len(total)-1) Totals
FROM w1
inner join wk
on w1.tutorialdate between wk.startdate and wk.enddate
CROSS APPLY
(
SELECT cast(count(w2.type) as varchar(max)) + ' / '
from w1 w2
inner join wk wk2
on w2.tutorialdate between wk2.startdate and wk2.enddate
WHERE w2.tutor = w1.tutor
AND wk2.weeknumber = wk.weeknumber
group by w2.tutor, wk2.weeknumber, w2.type
FOR XML PATH('')
) D ( total )
) x
PIVOT
(
min(totals)
for weeknumber in ([1], [2])
) p
See SQL Fiddle with Demo