sql group by per month and year with one row id - sql

I have a table with month, year and an id like this.
+-------+-----------+-----
| month | year | id
+-------+-----------+-----
| 1 | 2016 |1
+-------+-----------+-----
| 2 | 2016 |2
+-------+-----------+-----
| 2 | 2016 |3
+-------+-----------+-----
and i want a sql query that give me only one row per month/year with the id if is single or the id null if there are multiple row for that month year.
in the case above
+-------+-----------+-----
| month | year | id
+-------+-----------+-----
| 1 | 2016 |1
+-------+-----------+-----
| 2 | 2016 |null
+-------+-----------+-----
how can i do this query qith sql server 2012?

You can do this with aggregation and case:
select month, year,
(case when min(id) = max(id) then min(id) end) as id
from t
group by month, year;
Note: month and year are bad names for columns, because they are reserved words. If these are really the names of your columns, you will need to escape them.

**Question:**only one row per month/year with the id if is single or the id null if there are multiple row for that month year
with cte
as
(
select row_number() over (partition by month,year order by (Select 1)) as rn,
month,year,id
from
table
)
select
case when rn>1 then null else id end as 'id',
month,
year
from
cte

Do a GROUP BY that counts each month and year's number of rows. Have a case expression that if one row, returns it using min(id), otherwise (more than 1 row) returns null.
select month, year, case when count(*) = 1 then min(id) else null end
from tablename
group by month, year;

I missed the part about the NULL in the OP, here's an adjusted query
DECLARE #tbl TABLE
(
Month INT NOT NULL,
Year INT NOT NULL,
Id INT NOT NULL
)
INSERT INTO #tbl VALUES
(2016, 1, 1)
,(2016, 2, 2)
,(2016, 2, 3)
SELECT
Month
,Year
,CASE WHEN COUNT(Id) > 1 THEN NULL ELSE COUNT(Id) END AS [Id]
FROM #tbl
GROUP BY Month, Year
http://rextester.com/FWVHI57098

Related

How to calculate occurrence depending on months/years

My table looks like that:
ID | Start | End
1 | 2010-01-02 | 2010-01-04
1 | 2010-01-22 | 2010-01-24
1 | 2011-01-31 | 2011-02-02
2 | 2012-05-02 | 2012-05-08
3 | 2013-01-02 | 2013-01-03
4 | 2010-09-15 | 2010-09-20
4 | 2010-09-30 | 2010-10-05
I'm looking for a way to count the number of occurrences for each ID in a Year per Month.
But what is important, If some record has a Start date in the following month compared to the End date (of course from the same year) then occurrence should be counted for both months [e.g. ID 1 in the 3rd row has a situation like that. So in this situation, the occurrence for this ID should be +1 for January and +1 for February].
So I'd like to have it in this way:
Year | Month | Id | Occurrence
2010 | 01 | 1 | 2
2010 | 09 | 4 | 2
2010 | 10 | 4 | 1
2011 | 01 | 1 | 1
2011 | 02 | 1 | 1
2012 | 05 | 2 | 1
2013 | 01 | 3 | 1
I created only this for now...
CREATE TABLE IF NOT EXISTS counts AS
(SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source)
And I don't know how to move with that further. I'd appreciate your help.
I'm using Spark SQL.
Try the following strategy to achieve this:
Note:
I have created few intermediate tables. If you wish you can use sub-query or CTE depending on the permissions
I have taken care of 2 scenarios you mentioned (whether to count it as 1 occurrence or 2 occurrence) as you explained
Query:
Firstly, creating a table with flags to decide whether start and end date are falling on same year and month (1 means YES, 2 means NO):
/* Creating a table with flags whether to count the occurrences once or twice */
CREATE TABLE flagged as
(
SELECT *,
CASE
WHEN Year_st = Year_end and Month_st = Month_end then 1
WHEN Year_st = Year_end and Month_st <> Month_end then 2
Else 0
end as flag
FROM
(
SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source
) as calc
)
Now the flag in the above table will have 1 if year and month are same for start and end 2 if month differs. You can have more categories of flag if you have more scenarios.
Secondly, counting the occurrences for flag 1. As we know year and month are same for flag 1, we can take either of it. I have taken start:
/* Counting occurrences only for flag 1 */
CREATE TABLE flg1 as (
SELECT distinct id, year_st, month_st, count(*) as occurrence
FROM flagged
where flag=1
GROUP BY id, year_st, month_st
)
Similarly, counting the occurrences for flag 2. Since month differs for both the dates, we can UNION them before counting to get both the dates in same column:
/* Counting occurrences only for flag 2 */
CREATE TABLE flg2 as
(
SELECT distinct id, year_dt, month_dt, count(*) as occurrence
FROM
(
select ID, year_st as year_dt, month_st as month_dt FROM flagged where flag=2
UNION
SELECT ID, year_end as year_dt, month_end as month_dt FROM flagged where flag=2
) as unioned
GROUP BY id, year_dt, month_dt
)
Finally, we just have to SUM the occurrences from both the flags. Note that we use UNION ALL here to combine both the tables. This is very important because we need to count duplicates as well:
/* UNIONING both the final tables and summing the occurrences */
SELECT distinct year, month, id, SUM(occurrence) as occurrence
FROM
(
SELECT distinct id, year_st as year, month_st as month, occurrence
FROM flg1
UNION ALL
SELECT distinct id, year_dt as year, month_dt as month, occurrence
FROM flg2
) as fin_unioned
GROUP BY id, year, month
ORDER BY year, month, id, occurrence desc
Output of above query will be your expected output. I know this is not an optimized one, yet it works perfect. I will update if I come across optimized strategy. Comment if you have question.
db<>fiddle link here
Not sure if this works in Spark SQL.
But if the ranges aren't bigger than 1 month, then just add the extra to the count via a UNION ALL.
And the extra are those with the end in a higher month than the start.
SELECT YearOcc, MonthOcc, Id
, COUNT(*) as Occurrence
FROM
(
SELECT Id
, YEAR(CAST(Start AS DATE)) as YearOcc
, MONTH(CAST(Start AS DATE)) as MonthOcc
FROM source
UNION ALL
SELECT Id
, YEAR(CAST(End AS DATE)) as YearOcc
, MONTH(CAST(End AS DATE)) as MonthOcc
FROM source
WHERE MONTH(CAST(Start AS DATE)) < MONTH(CAST(End AS DATE))
) q
GROUP BY YearOcc, MonthOcc, Id
ORDER BY YearOcc, MonthOcc, Id
YearOcc | MonthOcc | Id | Occurrence
------: | -------: | -: | ---------:
2010 | 1 | 1 | 2
2010 | 9 | 4 | 2
2010 | 10 | 4 | 1
2011 | 1 | 1 | 1
2011 | 2 | 1 | 1
2012 | 5 | 2 | 1
2013 | 1 | 3 | 1
db<>fiddle here

Find Customers With 4 Consecutive Years of Giving (Including Gaps)

I have a table similar to below:
+------------+-----------+
| CustomerID | OrderYear |
+------------+-----------+
| 1 | 2012 |
| 1 | 2013 |
| 1 | 2014 |
| 1 | 2017 |
| 1 | 2018 |
| 2 | 2012 |
| 2 | 2013 |
| 2 | 2014 |
| 2 | 2015 |
| 2 | 2017 |
+------------+-----------+
How would I identify which CustomerIDs have 4 consecutive years of giving? (In the above, only customer 2.) As you can see, some records will have gaps in order years.
I started down the row of trying to utilize some combination of ROW_NUMBER/LAG/LEAD with no luck to this point.
Very paired down/modified attempt...
WITH CTE
AS
(
SELECT T.ConstituentLookupID,
T.FISCALYEAR,
COUNT(T.FISCALYEAR) OVER (PARTITION BY T.ConstituentLookupID) AS
YearCount,
FIRST_VALUE(T.FISCALYEAR) OVER(PARTITION BY T.ConstituentLookupID ORDER
BY T.FISCALYEAR DESC) - T.FISCALYEAR + 1 as X,
ROW_NUMBER() OVER(PARTITION BY T.ConstituentLookupID ORDER BY
T.FISCALYEAR DESC) AS RN
FROM #Temp AS T)
SELECT CTE.ConstituentLookupID,
CTE.FISCALYEAR,
CTE.YearCount,
CTE.X,
CTE.RN,
FROM CTE
WHERE CTE.YearCount >= 4 --Have at least 4 years of giving
AND CTE.X - CTE.RN = 1 --Some kind of way to calculate consecutive years. Doesnt account current year and gaps...;
Assuming no duplicates, you can use lag():
select distinct customerid
from (
select t.*,
lag(orderyear, 3) over(partition by customerid order by orderyear) oderyear3
from mytable t
) t
where orderyear = orderyear3 + 3
A more conventional approach is to use some gaps-and-islands technique. This is convenient if you want the start and end of each series. Here, an island is a series of rows with "adjacent" order years, and you want islands that are at least 4 years long. We can identify the islands by comparing the order year against an incrementing sequence, then use aggregation:
select customerid, min(orderyear) firstorderyear, max(orderyear) lastorderyear
from (
select t.*,
row_number() over(partition by customerid order by orderyear) rn
from mytable t
) t
group by customerid, orderyear - rn
having count(*) >= 4
Assuming you have no more than one row per customer and year, the simplest method is lag():
select customerid, year
from (select t.*,
lag(orderyear, 3) over (partition by customerid order by orderyear) as prev3_year
from t
) t
where prev3_year = year - 3;
The idea is to look 3 years back. If that year is year - 3, then there are four years in a row. If your data can have duplicates, there are tweaks to the logic (they make the query more slightly more complicated).
This could return duplicates, so you might just want:
select distinct customerid
from (select t.*,
lag(orderyear, 3) over (partition by customerid order by orderyear) as prev3_year
from t
) t
where prev3_year = year - 3;
I have a simple solution using row number and group by
SELECT Max(z.customerid),
Count(z.grp)
FROM (SELECT customerid,
orderyear,
orderyear - Row_number()
OVER (
ORDER BY customerid) AS Grp
FROM mytable)z
GROUP BY z.grp
HAVING Count(z.grp) = 4

Retrieve rows in SQL based on Maximum value in multiple columns

I have a SQL table with the following fields:
Company ID
Company Name
Fiscal Year
Fiscal Quarter
There are multiple records for various fiscal years and fiscal quarters for each company. I want to retrieve the rows for each company based on Maximum Fiscal Year and Maximum Fiscal Quarter. For example, if the table has the following:
Company ID | Company Name | Fiscal Year | Fiscal Quarter
1 | Test1 | 2017 | 1
1 | Test1 | 2017 | 2
1 | Test1 | 2018 | 1
1 | Test1 | 2018 | 2
2 | Test2 | 2018 | 3
2 | Test2 | 2018 | 4
The query should return the following (Only the record with the maximum fiscal year and maximum fiscal quarter for that year):
Company ID | Company Name | Fiscal Year | Fiscal Quarter
1 | Test1 | 2018 | 2
2 | Test2 | 2018 | 4
I am able to use the below query to get the records with the maximum fiscal year but not sure how to further select the maximum quarter within the year:
SELECT fp.companyId, fp.companyname, fp.fiscalyear,fp.fiscalquarter
FROM dbo.ciqFinPeriod fp
LEFT OUTER JOIN dbo.ciqFinPeriod fp2
ON (fp.companyId = fp2.companyId AND fp.fiscalyear < fp2.fiscalyear)
WHERE fp2.companyId IS NULL
Thank you so much for any assistance!
If you have a list of companies, I would simply do:
select fp.*
from Companies c outer apply
(select top (1) fp.*
from dbo.ciqFinPeriod fp
where fp.companyId = c.companyId
order by fp.fiscalyear desc, fp.fiscalquarter desc
) fp;
If not, then row_number() is probably the simplest method:
select fp.*
from (select fp.*,
row_number() over (partition by fp.companyId order by order by fp.fiscalyear desc, fp.fiscalquarter desc) as seqnum
from dbo.ciqFinPeriod fp
) fp
where seqnum = 1;
Or the somewhat more abstruse (clever ?):
select top (1) with ties fp.*
from dbo.ciqFinPeriod fp
order by row_number() over (partition by fp.companyId order by order by fp.fiscalyear desc, fp.fiscalquarter desc)
I've had some success with the following, same output as you.
create table #table
(
CompanyID int,
CompanyName varchar(200),
Year int,
Quater int
)
insert into #table (CompanyID,CompanyName,Year,Quater)
VALUES
('1','Test1','2017','1'),
('1','Test1','2017','2'),
('1','Test1','2018','1'),
('1','Test1','2018','2'),
('2','Test2','2018','3'),
('2','Test2','2018','4')
SELECT CompanyID,CompanyName,Year,Quater
FROM
(
Select CompanyID,CompanyName,Year,Quater
, ROW_NUMBER() OVER(PARTITION BY CompanyID ORDER BY Year desc,Quater DESC)
as RowNum
from #table
) X WHERE RowNum = 1
drop table #table
Select Company I'd, company name,Max(year),Max(quarter) group by 1,2

Cummulative SUM based on columns

I have a table with values like this:
I want to get cumulative sum based on the ID and year, so it should return an output like this i.e for id- 1 and year 2010 the sum of records will be 2.
id-2 and year 2010 the sum of records will be 1 and
id- 2 and for year 2011 it will be 1+1 = 2 i.e i require a running total for each id in ascending order based upon year.
similarly for id =3 Sum will be 1 , for id 4 will be 1 based on the year. for 5 it will be 3 for yr 2014 , for 2015 it will be sum of count previous yr + sum of count current yr i.e it will be 3 + 1 = 4 and for year 2016 it will be 3+ 1+1 = 5. Hence what is to be done. Could someone please help?
No need to make thinks more complicated than they need to be...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
ID INT NOT NULL,
[Year] INT NOT NULL
);
INSERT #TestData (ID, Year) VALUES
(1, 2010), (1, 2010), (2, 2010), (2, 2011),
(3, 2012), (4, 2013), (5, 2014), (5, 2014),
(5, 2014), (5, 2015), (5, 2016);
--=======================================
SELECT
tdg.ID,
tdg.Year,
RunningCount = SUM(tdg.Cnt) OVER (PARTITION BY tdg.ID ORDER BY tdg.Year ROWS UNBOUNDED PRECEDING)
FROM (
SELECT td.ID, td.Year, Cnt = COUNT(1)
FROM #TestData td
GROUP BY td.ID, td.Year
) tdg;
Results...
ID Year RunningCount
----------- ----------- ------------
1 2010 2
2 2010 1
2 2011 2
3 2012 1
4 2013 1
5 2014 3
5 2015 4
5 2016 5
this is more nesting than I would like, and I feel there is a better way to do this with maybe only one windows function but I can't get past not having a unique row for your data.
SELECT id,
year ,sum(c) OVER (
PARTITION BY id ORDER BY year rows unbounded preceding
)
FROM (
SELECT id,
year,
count(rn) c
FROM (
SELECT id,
year,
row_number() OVER (
ORDER BY year
) AS rn
FROM your_table -- you will need to change this to your table
) a
GROUP BY id,
year
) a
what we do is first build the data with a row number so now everything is unique, after that we then do a count on that unique row number and do windows function to do a running total for the count of rows by year.
There are many ways to do this. Here is one of them, with an inner query:
create table #table_name
(
UserID int,
Year int
)
INSERT INTO #table_name (UserID, Year)
VALUES
(1, 2010)
,(1,2010)
,(2,2010)
,(2,2011)
,(3,2012)
,(4,2013)
,(5,2014)
,(5,2014)
,(5,2014)
,(5,2015)
,(5,2016)
SELECT
UserID
,YEAR
,(SELECT COUNT(Year) FROM #table_name WHERE Year <= tt.Year AND UserID = tt.UserID)
FROM
#table_name AS tt
GROUP BY UserID, Year
you can also use row number over (edit : see below answer for this technique, I think it is a little bit too complicated for such a simple task). The query above returns your required output
+--------+------+-------+
| UserID | Year | COUNT |
+--------+------+-------+
| 1 | 2010 | 2 |
| 2 | 2010 | 1 |
| 2 | 2011 | 2 |
| 3 | 2012 | 1 |
| 4 | 2013 | 1 |
| 5 | 2014 | 3 |
| 5 | 2015 | 4 |
| 5 | 2016 | 5 |
+--------+------+-------+

Postgresql: Split line into 4 lines

My concern is to split a single line into 4 lines using a SQL script.
It is that I get in an SQL Result the year, the quarter, the month and a x worthy value. Now I would also like to spend the week of the month (1-4) without having to add this as a column of the table.
Likewise, the value should be divided by four.
Thus, from this result:
year | quarter | month | value
2016 | 1 | 1 | 78954
This result:
year | quarter | month | week | value
2016 | 1 | 1 | 1 | 19738,5
2016 | 1 | 1 | 2 | 19738,5
2016 | 1 | 1 | 3 | 19738,5
2016 | 1 | 1 | 4 | 19738,5
I have no idea how I could implement this.
I hope anyone can help me.
Best regards
You could do it with a cartesian join:
SELECT a.year, a.quarter, a.month, b.week, a.value
FROM a, (SELECT UNNEST(ARRAY[1, 2, 3, 4]) as week) b
Just use union:
select year, quarter, month, 1 as week, value / 4 as value
union all
select year, quarter, month, 2 as week, value / 4 as value
union all
select year, quarter, month, 3 as week, value / 4 as value
union all
select year, quarter, month, 4 as week, value / 4 as value
You can also use `generate_series() for that:
select t.year, t.quarter, t.month, w.week, t.value / 4
from the_table t
cross join generate_series(1,4) as w(week)
order by t.year, t.quarter, w.week;
Using generate_series() is more flexible if you need to change the number of repeated rows you want - although "weeks per quarter" doesn't really need that flexibility.
Or you can do it in very scientifically looking way :-)
WITH series as (select generate_series(1,4,1) as week ),
data as (SELECT 2016 as year, 1 as quarter, 1 as month, 78954 as value)
SELECT d.year, d.quarter, d.month, s.week, d.value/(SELECT count(*) FROM series)::numeric
FROM data d JOIN series s ON true