SQL Server : Split Columns to Count Totals By Year in same row - sql

T-SQL question: I have a breakdown of the visits by year of a person. I want to get a count of the total visits per year and then create a column for each year to summarize the total. The way I have it now it only returns the totals per year for the columns on separate rows. How can I combine them?
Example temp table and code...
CREATE TABLE #Events
(
Col1 int PRIMARY KEY,
Person Varchar(20),
VisitYear INT,
VisitInfo Varchar(20)
)
INSERT INTO #Events
VALUES (1, 'User1', '2017', 'Combo'), (2, 'User1', '2017', 'Special'),
(3, 'User1', '2018', 'ComboBig'), (4, 'User2', '2017', 'Special'),
(5, 'User2', '2017', 'ComboBig'), (6, 'User2', '2018', 'ComboBig'),
(7, 'User2', '2018', 'Special'), (8, 'User2', '2018', 'Special2'),
(9, 'User3', '2018', 'Combo')
SELECT DISTINCT
Person,
VisitYear,
VisitInfo,
COUNT(Person) OVER(PARTITION BY Person, VisitYear) AS TtlPerYear
INTO
#EventCountPerYear
FROM
#Events E
SELECT *
FROM #EventCountPerYear
SELECT DISTINCT
E1.Person,
CASE WHEN E1.VisitYear IN ('2017') THEN E1.TtlPerYear END AS '2017_Visits',
CASE WHEN E1.VisitYear IN ('2018') THEN E1.TtlPerYear END AS '2018_Visits'
FROM
#EventCountPerYear E1
Right now the results comes split with the Year counts on separate rows. Is there a way to have it result in just one clean row of data?
Images of current and desired results listed below...
The current results:
Desired results]

On SQL-Server you could use a PIVOT:
SELECT Person, [2017], [2018]
FROM (SELECT Person, VisitYear FROM #Events) src
PIVOT (COUNT(VisitYear) FOR VisitYear IN ([2017],[2018])) pvt
GO
Person | 2017 | 2018
:----- | ---: | ---:
User1 | 2 | 1
User2 | 2 | 3
User3 | 0 | 1
dbfiddle here

You are on the path to using conditional aggregation. This looks like:
SELECT E1.Person
SUM(CASE WHEN E1.VisitYear IN (2017) THEN E1.TtlPerYear END) AS [2017_Visits],
SUM(CASE WHEN E1.VisitYear IN (2018) THEN E1.TtlPerYear END) AS [2018_Visits]
FROM #EventCountPerYear E1
GROUP BY E1.Person;
Notes:
Don't put single quotes around numeric values, such as the year.
Don't use single quotes for column aliases. In fact, only use single quotes for string and date constants.
This adds the GROUP BY clause.

Related

Do different sums in the same query

I currently have an issue on a query:
I have 2 tables.
Table 1:
Table 2:
I'm trying to join both tables on DateHour (that works) and for each campaign, for each PRF_ID, for each LENGTH and for each Type, to calculate count the occurences of the HPP column of table 2, year per year.
So for instance, for a given PRF_ID, length,type and campaign, I will have a range of dates in Table 1 between 01/04/2019 and 01/04/2020.
In this case, in my query, I need a new column giving me for all the dates between 01/04/2019 and 31/12/2019 the sum of HPP ocurrences in this period.
For 2020, the sum would be between 01/01/2020 and 01/04/2020.
I tried doing something like this:
SELECT Table1.DateHour,
SUM(Table2.HPP) OVER (PARTITION BY YEAR(Table2.DateHour)
FROM Table1
LEFT JOIN Table2 on Table2.DateHour=Table1.DateHour
But that gives me really odd results, the OVER PARTITION BY does not seem to work.
Your question is confusing because it mixes terminology.
Count versus sum
... to calculate count the occurences ... the sum would be ...
Counting occurrences is not the same as adding them up. Every records that can be joined counts as an occurence. Calculating the sum means adding up the values of a column. I added both calculations to the solution below (see IsHour_Sum versus Table2_Count).
Grouping on combinations
Do different sums in the same query [question title]
... and for each campaign, for each PRF_ID, for each LENGTH and for each Type ...
Do you want to aggregate over each combination of those columns or do you want to aggregate over each column individually? I have assumed you are after the combinations in my solution. Example to clarify:
If
column A has 3 possible values (A1, A2, A3) and
column B has 2 possible values (B1, B2)
Then
there are 5 counts (3 + 2) when aggregating (A) and (B) indivually
there are 6 counts (3 * 2) when aggregating each combination of (A,B)
Again:
A1 -> count 1 vs A1,B1 -> count 1
A2 -> count 2 A1,B2 -> count 2
A3 -> count 3 A2,B1 -> count 3
B1 -> count 4 A2,B2 -> count 4
B2 -> count 5 A3,B1 -> count 5
A3,B2 -> count 6
Sample data
I left out the column Value from table1 because it is not part of your question. I also changed the dates for table2 to 2018-05-23 to match with the majority of the table1 records. Otherwise all counts and sums would be 0.
declare #table1 table
(
DateHour datetime,
PRF_ID int,
Campaign nvarchar(5),
Length int,
ContractType nvarchar(5)
);
insert into #table1 (DateHour, PRF_ID, Campaign, Length, ContractType) values
('2018-05-23 00:00', 1, 'Q218', 1, 'G'),
('2018-05-23 01:00', 1, 'Q218', 1, 'G'),
('2018-05-23 02:00', 1, 'Q218', 1, 'G'),
('2020-05-23 03:00', 1, 'Q120', 1, 'G'),
('2018-05-23 04:00', 1, 'Q218', 1, 'G'),
('2019-07-23 01:00', 1, 'Q219', 1, 'G');
declare #table2 table
(
DateHour datetime,
HPP int
);
insert into #table2 (DateHour, HPP) values
('2018-05-23 00:00', 0),
('2018-05-23 01:00', 0),
('2018-05-23 02:00', 1),
('2018-05-23 03:00', 0),
('2018-05-23 04:00', 1),
('2018-05-23 05:00', 0),
('2018-05-23 06:00', 0),
('2018-05-23 07:00', 0);
Solution
The easiest way to aggregation on the year of the dates, is to split of that part as a new column instead of using a over(partition by ...) construction. If you do not need the new column Year in your output, then you can simply remove it from the field list (after select), but it must remain in the grouping clause (after group by).
select year(t1.DateHour) as 'Year',
t1.PRF_ID,
t1.Campaign,
t1.Length,
t1.ContractType,
isnull(sum(t2.HPP), 0) as 'IsHour_Sum',
count(t2.DateHour) as 'Table2_Count'
from #table1 t1
left join #table2 t2
on t2.DateHour = t1.DateHour
/* -- specify date filter as required
where t1.DateHour >= '2019-04-01 00:00'
and t1.DateHour < '2020-04-01 00:00'
*/
group by year(t1.DateHour),
t1.PRF_ID,
t1.Campaign,
t1.Length,
t1.ContractType;
Result
Year PRF_ID Campaign Length ContractType IsHour_Sum Table2_Count
----------- ----------- -------- ----------- ------------ ----------- ------------
2018 1 Q218 1 G 2 4
2019 1 Q219 1 G 0 0
2020 1 Q120 1 G 0 0

Unable to pivot data in SQL

I have an SQL query like
select name from customers where id in (1,2,3,4,5)
this will return me 5 rows.
What I am trying is
select Q1,Q2,Q3,Q4,Q5 from(select name from customers where id in (1,2,3,4,5)
)d pivot( max(name) for names in (Q1,Q2,Q3,Q4,Q5)
) piv;
to convert these 5 rows into 5 columns.
But I am getting null values in columns.
I don't know where am I doing wrong.
Any kind of help , suggestion would be appreciated.
Thanks
You can try in this way, you are pivoting the name and selecting the name under value also so please first make is sure if there is a value for that ids and retrieve the value for which you are trying pivoting:
create table #customers(id int, name varchar(50), fee int)
insert into #customers values
(1, 'Q1', 100),
(2, 'Q2', 200),
(3, 'Q3', 300),
(4, 'Q4', 400),
(5, 'Q5', 500),
(6, 'Q5', 600),
(7, 'Q5', 700)
select Q1,Q2,Q3,Q4,Q5
from(select name, fee from #customers where id in (1,2,3,4,5)
)d pivot(SUM(fee) for name in (Q1,Q2,Q3,Q4,Q5)
) piv;
OUTPUT:
Q1 Q2 Q3 Q4 Q5
100 200 300 400 500
What respective value you want to display under name, if you mention that then we can provide more exact solution for that. Hope you can change your code accordingly.

Querying SQL Server table with different values in same column with same ID [duplicate]

This question already has answers here:
Querying SQL table with different values in same column with same ID
(2 answers)
Closed 6 years ago.
I have an SQL Server 2012 table with ID, First Name and Last name. The ID is unique per person but due to an error in the historical feed, different people were assigned the same id.
------------------------------
ID FirstName LastName
------------------------------
1 ABC M
1 ABC M
1 ABC M
1 ABC N
2 BCD S
3 CDE T
4 DEF T
4 DEF T
There are two ID's which are present multiple time. 1 and 4. The rows with id 4 are identical. I dont want this in my result. The rows with ID 1, although the first name is same, the last name is different for 1 row. I want only those ID's whose ID is same but one of the first or last names is different.
I tried loading ID's which have multiple occurrences into a temp table and tried to compare it against the parent table albeit unsuccessfully. Any other ideas that I can try and implement?
This is the output I am looking for
ID
---
1
If you want the ids, then use aggregation and having:
select id
from t
group by id
having min(firstname) <> max(firstname) or min(lastname) <> max(lastname);
Try This:
CREATE TABLE #myTable(id INT, firstname VARCHAR(50), lastname VARCHAR(50))
INSERT INTO #myTable VALUES
(1, 'ABC', 'M'),
(1, 'ABC', 'M'),
(1, 'ABC', 'M'),
(1, 'ABC', 'N'),
(2, 'BCD', 'S'),
(3, 'CDE', 'T'),
(4, 'DEF', 'T'),
(4, 'DEF', 'T')
SELECT id FROM (
SELECT DISTINCT id, firstname, lastname
FROM #myTable) t GROUP BY id HAVING COUNT(*)>1
OUTPUT is : 1

How can I get the sum of one group of data

I have problem to get this list of data. Mostly everything I tried did not go as expected.
This is my input data:
Code Name total
01 First 50
02 Last 20
11 First 10
12 Last 25
21 First 15
22 Last 15
This is the output I would like:
Code Name total
01 First 50
02 Last 20
GROUP: 0 - 70
11 First 10
12 Last 25
GROUP: 1 - 35
21 First 15
22 Last 15
GROUP: 2 - 30
I need third row(not column) after first two rows that represent the group of first two rows(group zero) (sum of first two rows like third row) and also for last two group.
This was my original idea when I saw the question. Might be better to use grouping sets than compute since Microsoft deprecated it.
with data as (
select Code, substring(Code, 1, len(Code) - 1) as Prefix, Name, Total from T
)
select
case when grouping(Name) = 1 then Prefix else min(Code) end as Code,
case when grouping(Name) = 1 then '-' else Name end as Name,
sum(Total) as Total
from data
group by grouping sets ( (Prefix, Name), (Prefix) )
order by Prefix, grouping(Name), Code
Fixed a few problems with my old query. Here's a SQL Fiddle.
declare #table table (code varchar(10), name varchar(10), total int);
insert into #table(code, name, total) values
('01', 'First', 50),
('02', 'Last', 20),
('11', 'First', 10),
('12', 'Last', 15),
('21', 'First', 15),
('22', 'Last', 15);
select * from #table;
--select code, name, sum(total)
-- from #table
-- group by rollup (substring(code,1,1), name);
select code, name, total,
sum(total) over (partition by substring(code,1,1)) as subtotal
from #table;
Compute is not longer supported:
A reach and not tested
SELECT Code, Name, Total
FROM Table
ORDER BY Code, Name
COMPUTE SUM(Total) BY SubString(Code,1,1);
Compute

AVG and COUNT in SQL Server

I have a rating system in which any person may review other. Each person can be judged by one person more than once. For the calculation of averages, I would like to include only the most current values​​.
Is this possible with SQL?
Person 1 rates Person 2 with 5 on 1.2.2011 <- ignored because there is a newer rating of person 1
Person 1 rates Person 2 with 2 on 1.3.2011
Person 2 rates Person 1 with 6 on 1.2.2011 <-- ignored as well
Person 2 rates Person 1 with 3 on 1.3.2011
Person 3 rates Person 1 with 5 on 1.5.2011
Result:
The Average for Person 2 is 2.
The Average for Person 1 is 4.
The table may look like this: evaluator, evaluatee, rating, date.
Kind Regards
Michael
It's perfectly possible.
Let's assume your table structure looks like this:
CREATE TABLE [dbo].[Ratings](
[Evaluator] varchar(10),
[Evaluatee] varchar(10),
[Rating] int,
[Date] datetime
);
and the values like this:
INSERT INTO Ratings
SELECT 'Person 1', 'Person 2', 5, '2011-02-01' UNION
SELECT 'Person 1', 'Person 2', 2, '2011-03-01' UNION
SELECT 'Person 2', 'Person 1', 6, '2011-02-01' UNION
SELECT 'Person 2', 'Person 1', 3, '2011-03-01' UNION
SELECT 'Person 3', 'Person 1', 5, '2011-05-01'
Then the average rating for Person 1 is:
SELECT AVG(Rating) FROM Ratings r1
WHERE Evaluatee='Person 1' and not exists
(SELECT 1 FROM Ratings r2
WHERE r1.Evaluatee = r2.Evaluatee AND
r1.evaluator=r2.evaluator AND
r1.date < r2.date)
Result:
4
Or for all Evaluatee's, grouped by Evaluatee:
SELECT Evaluatee, AVG(Rating) FROM Ratings r1
WHERE not exists
(SELECT 1 FROM Ratings r2
WHERE r1.Evaluatee = r2.Evaluatee AND
r1.evaluator = r2.evaluator AND
r1.date < r2.date)
GROUP BY Evaluatee
Result:
Person 1 4
Person 2 2
This might look like it has an implicit assumption that no entries exist with the same date;
but that's actually not a problem: If such entries can exist, then you can not decide which of these was made later anyway; you could only choose randomly between them. Like shown here, they are both included and averaged - which might be the best solution you can get for that border case (although it slightly favors that person, giving him two votes).
To avoid this problem altogether, you could simply make Date part of the primary key or a unique index - the obvious primary key choice here being the columns (Evaluator, Evaluatee, Date).
declare #T table
(
evaluator int,
evaluatee int,
rating int,
ratedate date
)
insert into #T values
(1, 2, 5, '20110102'),
(1, 2, 2, '20110103'),
(2, 1, 6, '20110102'),
(2, 1, 3, '20110103'),
(3, 1, 5, '20110105')
select evaluatee,
avg(rating) as avgrating
from (
select evaluatee,
rating,
row_number() over(partition by evaluatee, evaluator
order by ratedate desc) as rn
from #T
) as T
where T.rn = 1
group by evaluatee
Result:
evaluatee avgrating
----------- -----------
1 4
2 2
This is possible to do, but it can be REALLY harry - SQL was not designed to compare rows, only columns. I would strongly recommend you keep an additional table containing only the most recent data, and store the rest in an archive table.
If you must do it this way, then I'll need a full table structure to try to write a query for this. In particular I need to know which are the unique indexes.