Cummulative SUM based on columns - sql

I have a table with values like this:
I want to get cumulative sum based on the ID and year, so it should return an output like this i.e for id- 1 and year 2010 the sum of records will be 2.
id-2 and year 2010 the sum of records will be 1 and
id- 2 and for year 2011 it will be 1+1 = 2 i.e i require a running total for each id in ascending order based upon year.
similarly for id =3 Sum will be 1 , for id 4 will be 1 based on the year. for 5 it will be 3 for yr 2014 , for 2015 it will be sum of count previous yr + sum of count current yr i.e it will be 3 + 1 = 4 and for year 2016 it will be 3+ 1+1 = 5. Hence what is to be done. Could someone please help?

No need to make thinks more complicated than they need to be...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
ID INT NOT NULL,
[Year] INT NOT NULL
);
INSERT #TestData (ID, Year) VALUES
(1, 2010), (1, 2010), (2, 2010), (2, 2011),
(3, 2012), (4, 2013), (5, 2014), (5, 2014),
(5, 2014), (5, 2015), (5, 2016);
--=======================================
SELECT
tdg.ID,
tdg.Year,
RunningCount = SUM(tdg.Cnt) OVER (PARTITION BY tdg.ID ORDER BY tdg.Year ROWS UNBOUNDED PRECEDING)
FROM (
SELECT td.ID, td.Year, Cnt = COUNT(1)
FROM #TestData td
GROUP BY td.ID, td.Year
) tdg;
Results...
ID Year RunningCount
----------- ----------- ------------
1 2010 2
2 2010 1
2 2011 2
3 2012 1
4 2013 1
5 2014 3
5 2015 4
5 2016 5

this is more nesting than I would like, and I feel there is a better way to do this with maybe only one windows function but I can't get past not having a unique row for your data.
SELECT id,
year ,sum(c) OVER (
PARTITION BY id ORDER BY year rows unbounded preceding
)
FROM (
SELECT id,
year,
count(rn) c
FROM (
SELECT id,
year,
row_number() OVER (
ORDER BY year
) AS rn
FROM your_table -- you will need to change this to your table
) a
GROUP BY id,
year
) a
what we do is first build the data with a row number so now everything is unique, after that we then do a count on that unique row number and do windows function to do a running total for the count of rows by year.

There are many ways to do this. Here is one of them, with an inner query:
create table #table_name
(
UserID int,
Year int
)
INSERT INTO #table_name (UserID, Year)
VALUES
(1, 2010)
,(1,2010)
,(2,2010)
,(2,2011)
,(3,2012)
,(4,2013)
,(5,2014)
,(5,2014)
,(5,2014)
,(5,2015)
,(5,2016)
SELECT
UserID
,YEAR
,(SELECT COUNT(Year) FROM #table_name WHERE Year <= tt.Year AND UserID = tt.UserID)
FROM
#table_name AS tt
GROUP BY UserID, Year
you can also use row number over (edit : see below answer for this technique, I think it is a little bit too complicated for such a simple task). The query above returns your required output
+--------+------+-------+
| UserID | Year | COUNT |
+--------+------+-------+
| 1 | 2010 | 2 |
| 2 | 2010 | 1 |
| 2 | 2011 | 2 |
| 3 | 2012 | 1 |
| 4 | 2013 | 1 |
| 5 | 2014 | 3 |
| 5 | 2015 | 4 |
| 5 | 2016 | 5 |
+--------+------+-------+

Related

How to create a field which shows a custom order based on two fields?

I am trying to create a field which shows an order based on two columns. I have one column with a code in and one with a date. There are many dates for each code, but I am trying to pick out the latest date for each code. The table below shows the two columns I have and the order column that I need to create.
code date order column
1 10/04/22 3
1 11/04/22 2
1 14/05/22 1
2 10/04/22 2
2 15/04/22 1
3 11/04/22 1
4 12/04/22 2
4 16/04/22 1
5 15/04/22 2
5 17/04/22 1
As Larnu and Sean have already stated, Row_number is your friend here.
Start with the data:
CREATE TABLE #Table (code int, date date)
INSERT INTO #table
VALUES
(1, '04/10/22')
,(1, '04/11/22')
,(1, '05/14/22')
,(2, '04/10/22')
,(2, '04/15/22')
,(3, '04/11/22')
,(4, '04/12/22')
,(4, '04/16/22')
,(5, '04/15/22')
,(5, '04/17/22');
Then we write the query with the row numbers. The magic here is in the partition by/order by. That partitions your data based on the code, so it takes the three 1s and puts the dates in descending order. It then numbers them 1, 2, 3 with the latest date being number 1. Then it does code 2...
SELECT code
, date
, ROW_NUMBER() OVER (PARTITION BY code ORDER BY date desc) rn
FROM #table
GROUP BY code, date
ORDER BY code asc, date asc;
And that gets us the result you asked for:
|code | date | rn |
|:----|:---------|:----|
| 1 |2022-04-10| 3 |
| 1 |2022-04-11| 2 |
| 1 |2022-05-14| 1 |
| 2 |2022-04-10| 2 |
| 2 |2022-04-15| 1 |
| 3 |2022-04-11| 1 |
| 4 |2022-04-12| 2 |
| 4 |2022-04-16| 1 |
| 5 |2022-04-15| 2 |
| 5 |2022-04-17| 1 |
And then if you only want the max date for each code... keep only the ones where row number equals 1.
WITH CTE AS
(SELECT code
, date
, ROW_NUMBER() OVER (PARTITION BY code ORDER BY date desc) rn
FROM #table
)
SELECT code
, date
FROM CTE
WHERE rn = 1
ORDER BY code
| code | date |
|:-----|:---------|
| 1 |2022-05-14|
| 2 |2022-04-15|
| 3 |2022-04-11|
| 4 |2022-04-16|
| 5 |2022-04-17|

How to calculate occurrence depending on months/years

My table looks like that:
ID | Start | End
1 | 2010-01-02 | 2010-01-04
1 | 2010-01-22 | 2010-01-24
1 | 2011-01-31 | 2011-02-02
2 | 2012-05-02 | 2012-05-08
3 | 2013-01-02 | 2013-01-03
4 | 2010-09-15 | 2010-09-20
4 | 2010-09-30 | 2010-10-05
I'm looking for a way to count the number of occurrences for each ID in a Year per Month.
But what is important, If some record has a Start date in the following month compared to the End date (of course from the same year) then occurrence should be counted for both months [e.g. ID 1 in the 3rd row has a situation like that. So in this situation, the occurrence for this ID should be +1 for January and +1 for February].
So I'd like to have it in this way:
Year | Month | Id | Occurrence
2010 | 01 | 1 | 2
2010 | 09 | 4 | 2
2010 | 10 | 4 | 1
2011 | 01 | 1 | 1
2011 | 02 | 1 | 1
2012 | 05 | 2 | 1
2013 | 01 | 3 | 1
I created only this for now...
CREATE TABLE IF NOT EXISTS counts AS
(SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source)
And I don't know how to move with that further. I'd appreciate your help.
I'm using Spark SQL.
Try the following strategy to achieve this:
Note:
I have created few intermediate tables. If you wish you can use sub-query or CTE depending on the permissions
I have taken care of 2 scenarios you mentioned (whether to count it as 1 occurrence or 2 occurrence) as you explained
Query:
Firstly, creating a table with flags to decide whether start and end date are falling on same year and month (1 means YES, 2 means NO):
/* Creating a table with flags whether to count the occurrences once or twice */
CREATE TABLE flagged as
(
SELECT *,
CASE
WHEN Year_st = Year_end and Month_st = Month_end then 1
WHEN Year_st = Year_end and Month_st <> Month_end then 2
Else 0
end as flag
FROM
(
SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source
) as calc
)
Now the flag in the above table will have 1 if year and month are same for start and end 2 if month differs. You can have more categories of flag if you have more scenarios.
Secondly, counting the occurrences for flag 1. As we know year and month are same for flag 1, we can take either of it. I have taken start:
/* Counting occurrences only for flag 1 */
CREATE TABLE flg1 as (
SELECT distinct id, year_st, month_st, count(*) as occurrence
FROM flagged
where flag=1
GROUP BY id, year_st, month_st
)
Similarly, counting the occurrences for flag 2. Since month differs for both the dates, we can UNION them before counting to get both the dates in same column:
/* Counting occurrences only for flag 2 */
CREATE TABLE flg2 as
(
SELECT distinct id, year_dt, month_dt, count(*) as occurrence
FROM
(
select ID, year_st as year_dt, month_st as month_dt FROM flagged where flag=2
UNION
SELECT ID, year_end as year_dt, month_end as month_dt FROM flagged where flag=2
) as unioned
GROUP BY id, year_dt, month_dt
)
Finally, we just have to SUM the occurrences from both the flags. Note that we use UNION ALL here to combine both the tables. This is very important because we need to count duplicates as well:
/* UNIONING both the final tables and summing the occurrences */
SELECT distinct year, month, id, SUM(occurrence) as occurrence
FROM
(
SELECT distinct id, year_st as year, month_st as month, occurrence
FROM flg1
UNION ALL
SELECT distinct id, year_dt as year, month_dt as month, occurrence
FROM flg2
) as fin_unioned
GROUP BY id, year, month
ORDER BY year, month, id, occurrence desc
Output of above query will be your expected output. I know this is not an optimized one, yet it works perfect. I will update if I come across optimized strategy. Comment if you have question.
db<>fiddle link here
Not sure if this works in Spark SQL.
But if the ranges aren't bigger than 1 month, then just add the extra to the count via a UNION ALL.
And the extra are those with the end in a higher month than the start.
SELECT YearOcc, MonthOcc, Id
, COUNT(*) as Occurrence
FROM
(
SELECT Id
, YEAR(CAST(Start AS DATE)) as YearOcc
, MONTH(CAST(Start AS DATE)) as MonthOcc
FROM source
UNION ALL
SELECT Id
, YEAR(CAST(End AS DATE)) as YearOcc
, MONTH(CAST(End AS DATE)) as MonthOcc
FROM source
WHERE MONTH(CAST(Start AS DATE)) < MONTH(CAST(End AS DATE))
) q
GROUP BY YearOcc, MonthOcc, Id
ORDER BY YearOcc, MonthOcc, Id
YearOcc | MonthOcc | Id | Occurrence
------: | -------: | -: | ---------:
2010 | 1 | 1 | 2
2010 | 9 | 4 | 2
2010 | 10 | 4 | 1
2011 | 1 | 1 | 1
2011 | 2 | 1 | 1
2012 | 5 | 2 | 1
2013 | 1 | 3 | 1
db<>fiddle here

SQL Server : sum for every four years

I have the table in SQL Server:
Table (Id, Year, Value)
and data like this:
Id | Year | Value
---+------+------
1 | 1993 | 5
2 | 1994 | 1
3 | 1995 | 2
4 | 1996 | 15
5 | 1997 | 8
6 | 1998 | 3
7 | 1999 | 1
8 | 2000 | 5
I need a sum for every four years, for example
Years | SUM (Value)
----------+-------------
1993-1996 | 23
1997-2000 | 17
How can I do this?
You can use simple arithmetic:
select min(year), max(year), sum(value)
from t
group by (year - 1) / 4
order by min(year);
Notes:
This puts the years into two columns. You can concatenate them if you really want a string.
This takes advantage of the fact that SQL Server does integer division.
Note the order by, so the results are in your expected order.
Here is a SQL Fiddle
You can play with the group by clause to get the specific grouping you want.
declare #StartYear int = 1993;
declare #YearsPerGroup int = 4;
select
case
when min([year]) = max([year]) then cast(min([year]) as varchar(10))
else cast(min([year]) as varchar(10)) + '-' + cast(max([year]) as varchar(10))
end as [Name],
min([year]) as [Start Year],
max([year]) as [End Year],
sum(Value) as [Sum (Value)]
from T
group by
([year] - #StartYear) - ([year] - #StartYear) % #YearsPerGroup

Retrieve rows in SQL based on Maximum value in multiple columns

I have a SQL table with the following fields:
Company ID
Company Name
Fiscal Year
Fiscal Quarter
There are multiple records for various fiscal years and fiscal quarters for each company. I want to retrieve the rows for each company based on Maximum Fiscal Year and Maximum Fiscal Quarter. For example, if the table has the following:
Company ID | Company Name | Fiscal Year | Fiscal Quarter
1 | Test1 | 2017 | 1
1 | Test1 | 2017 | 2
1 | Test1 | 2018 | 1
1 | Test1 | 2018 | 2
2 | Test2 | 2018 | 3
2 | Test2 | 2018 | 4
The query should return the following (Only the record with the maximum fiscal year and maximum fiscal quarter for that year):
Company ID | Company Name | Fiscal Year | Fiscal Quarter
1 | Test1 | 2018 | 2
2 | Test2 | 2018 | 4
I am able to use the below query to get the records with the maximum fiscal year but not sure how to further select the maximum quarter within the year:
SELECT fp.companyId, fp.companyname, fp.fiscalyear,fp.fiscalquarter
FROM dbo.ciqFinPeriod fp
LEFT OUTER JOIN dbo.ciqFinPeriod fp2
ON (fp.companyId = fp2.companyId AND fp.fiscalyear < fp2.fiscalyear)
WHERE fp2.companyId IS NULL
Thank you so much for any assistance!
If you have a list of companies, I would simply do:
select fp.*
from Companies c outer apply
(select top (1) fp.*
from dbo.ciqFinPeriod fp
where fp.companyId = c.companyId
order by fp.fiscalyear desc, fp.fiscalquarter desc
) fp;
If not, then row_number() is probably the simplest method:
select fp.*
from (select fp.*,
row_number() over (partition by fp.companyId order by order by fp.fiscalyear desc, fp.fiscalquarter desc) as seqnum
from dbo.ciqFinPeriod fp
) fp
where seqnum = 1;
Or the somewhat more abstruse (clever ?):
select top (1) with ties fp.*
from dbo.ciqFinPeriod fp
order by row_number() over (partition by fp.companyId order by order by fp.fiscalyear desc, fp.fiscalquarter desc)
I've had some success with the following, same output as you.
create table #table
(
CompanyID int,
CompanyName varchar(200),
Year int,
Quater int
)
insert into #table (CompanyID,CompanyName,Year,Quater)
VALUES
('1','Test1','2017','1'),
('1','Test1','2017','2'),
('1','Test1','2018','1'),
('1','Test1','2018','2'),
('2','Test2','2018','3'),
('2','Test2','2018','4')
SELECT CompanyID,CompanyName,Year,Quater
FROM
(
Select CompanyID,CompanyName,Year,Quater
, ROW_NUMBER() OVER(PARTITION BY CompanyID ORDER BY Year desc,Quater DESC)
as RowNum
from #table
) X WHERE RowNum = 1
drop table #table
Select Company I'd, company name,Max(year),Max(quarter) group by 1,2

How to do a count including not existing records?

How to do a count including not existing records, which should have '0' as the count?
Here is my table:
CREATE TABLE SURVEY
(year CHAR(4),
cust CHAR(2));
INSERT INTO SURVEY VALUES ('2011', 'AZ');
INSERT INTO SURVEY VALUES ('2011', 'CO');
INSERT INTO SURVEY VALUES ('2012', 'ME');
INSERT INTO SURVEY VALUES ('2014', 'ME');
INSERT INTO SURVEY VALUES ('2014', 'CO');
INSERT INTO SURVEY VALUES ('2014', 'ME');
INSERT INTO SURVEY VALUES ('2014', 'CO');
I've tried this, but of course it is missing zero counts:
select cust, year, count(*) as count from SURVEY
group by cust, year
I want to have this result:
+------+---------+--------+
| cust | year | count |
+------+---------+--------+
| AZ | 2011 | 1 |
| AZ | 2012 | 0 |
| AZ | 2014 | 0 |
| CO | 2011 | 1 |
| CO | 2012 | 0 |
| CO | 2014 | 2 |
| ME | 2011 | 0 |
| ME | 2012 | 1 |
| ME | 2014 | 2 |
+------+---------+--------+
please note:
My table has many records (~10k with different 'cust')
years may not be sequential (for example 2013 is skipped)
over time i may have 2015, 2016 and so on
the actual query will be executed in MS_ACCESS'2010 (not sure if its matter)
please help, thank you!
It sounds like you want a count for every cust x year combination with a zero when no survey record exists. If this is the case you will need two more tables: customers and years then do something like:
select leftside.cust, leftside.year, count(survey.cust) from
(select * from customers, years) as leftside left join survey
on leftside.cust = survey.cust and
leftside.year = survey.year
group by leftside.cust, leftside.year
select cust, year, (select count(cust) from survey) as count
from SURVEY
group by cust, year
But this query will return count of all records, without group condition.
If you have a domain table for years and customers:
select y.year, c.cust, count(s.year) as cnt
from customer as c
cross join year as y
left join survey as s
on s.year = y.year
and s.cust = c.cust
group by y.year, c.cust
If ms-access don't have cross join, you can do the same with:
from customer as c
join year as y
on 1 = 1
If you don't have domain tables you will somehow need to "invent" the domains since you cant create something from nothing.
If you have domain tables as others said, well and good. If you have to depend only on data in your table, the below query will do that for you.
select cp.cust, cp.year, iif(isnull(sum(cnt)), 0, sum(cnt)) as count from
(select * from (
(select distinct cust from survey) as c cross join
(select distinct year from survey) as y)
) cp left join
(select *, 1 as cnt from survey) s on cp.cust=s.cust and cp.year=s.year
group by cp.cust, cp.year
order by cp.cust,cp.year
Instead of iif(isnull(sum(cnt)), 0, sum(cnt)), you can use coalesce(sum(cnt),0) if that works. In MS Access use iif function and in other databases coalesce works.