Do different sums in the same query - sql

I currently have an issue on a query:
I have 2 tables.
Table 1:
Table 2:
I'm trying to join both tables on DateHour (that works) and for each campaign, for each PRF_ID, for each LENGTH and for each Type, to calculate count the occurences of the HPP column of table 2, year per year.
So for instance, for a given PRF_ID, length,type and campaign, I will have a range of dates in Table 1 between 01/04/2019 and 01/04/2020.
In this case, in my query, I need a new column giving me for all the dates between 01/04/2019 and 31/12/2019 the sum of HPP ocurrences in this period.
For 2020, the sum would be between 01/01/2020 and 01/04/2020.
I tried doing something like this:
SELECT Table1.DateHour,
SUM(Table2.HPP) OVER (PARTITION BY YEAR(Table2.DateHour)
FROM Table1
LEFT JOIN Table2 on Table2.DateHour=Table1.DateHour
But that gives me really odd results, the OVER PARTITION BY does not seem to work.

Your question is confusing because it mixes terminology.
Count versus sum
... to calculate count the occurences ... the sum would be ...
Counting occurrences is not the same as adding them up. Every records that can be joined counts as an occurence. Calculating the sum means adding up the values of a column. I added both calculations to the solution below (see IsHour_Sum versus Table2_Count).
Grouping on combinations
Do different sums in the same query [question title]
... and for each campaign, for each PRF_ID, for each LENGTH and for each Type ...
Do you want to aggregate over each combination of those columns or do you want to aggregate over each column individually? I have assumed you are after the combinations in my solution. Example to clarify:
If
column A has 3 possible values (A1, A2, A3) and
column B has 2 possible values (B1, B2)
Then
there are 5 counts (3 + 2) when aggregating (A) and (B) indivually
there are 6 counts (3 * 2) when aggregating each combination of (A,B)
Again:
A1 -> count 1 vs A1,B1 -> count 1
A2 -> count 2 A1,B2 -> count 2
A3 -> count 3 A2,B1 -> count 3
B1 -> count 4 A2,B2 -> count 4
B2 -> count 5 A3,B1 -> count 5
A3,B2 -> count 6
Sample data
I left out the column Value from table1 because it is not part of your question. I also changed the dates for table2 to 2018-05-23 to match with the majority of the table1 records. Otherwise all counts and sums would be 0.
declare #table1 table
(
DateHour datetime,
PRF_ID int,
Campaign nvarchar(5),
Length int,
ContractType nvarchar(5)
);
insert into #table1 (DateHour, PRF_ID, Campaign, Length, ContractType) values
('2018-05-23 00:00', 1, 'Q218', 1, 'G'),
('2018-05-23 01:00', 1, 'Q218', 1, 'G'),
('2018-05-23 02:00', 1, 'Q218', 1, 'G'),
('2020-05-23 03:00', 1, 'Q120', 1, 'G'),
('2018-05-23 04:00', 1, 'Q218', 1, 'G'),
('2019-07-23 01:00', 1, 'Q219', 1, 'G');
declare #table2 table
(
DateHour datetime,
HPP int
);
insert into #table2 (DateHour, HPP) values
('2018-05-23 00:00', 0),
('2018-05-23 01:00', 0),
('2018-05-23 02:00', 1),
('2018-05-23 03:00', 0),
('2018-05-23 04:00', 1),
('2018-05-23 05:00', 0),
('2018-05-23 06:00', 0),
('2018-05-23 07:00', 0);
Solution
The easiest way to aggregation on the year of the dates, is to split of that part as a new column instead of using a over(partition by ...) construction. If you do not need the new column Year in your output, then you can simply remove it from the field list (after select), but it must remain in the grouping clause (after group by).
select year(t1.DateHour) as 'Year',
t1.PRF_ID,
t1.Campaign,
t1.Length,
t1.ContractType,
isnull(sum(t2.HPP), 0) as 'IsHour_Sum',
count(t2.DateHour) as 'Table2_Count'
from #table1 t1
left join #table2 t2
on t2.DateHour = t1.DateHour
/* -- specify date filter as required
where t1.DateHour >= '2019-04-01 00:00'
and t1.DateHour < '2020-04-01 00:00'
*/
group by year(t1.DateHour),
t1.PRF_ID,
t1.Campaign,
t1.Length,
t1.ContractType;
Result
Year PRF_ID Campaign Length ContractType IsHour_Sum Table2_Count
----------- ----------- -------- ----------- ------------ ----------- ------------
2018 1 Q218 1 G 2 4
2019 1 Q219 1 G 0 0
2020 1 Q120 1 G 0 0

Related

Count Distinct not working as expected, output is equal to count

I have a table where I'm trying to count the distinct number of members per group. I know there's duplicates based on the count(distinct *) function. But when I try to group them into the group and count distinct, it's not spitting out the number I'd expect.
select count(distinct memberid), count(*)
from dbo.condition c
output:
count
count
303,781
348,722
select groupid, count(*), count(distinct memberid)
from dbo.condition c
group by groupid
output:
groupid
count
count
2
19,984
19,984
3
25,689
25,689
5
14,400
14,400
24
56,058
56,058
25
200,106
200,106
29
27,847
27,847
30
1,370
1,370
31
3,268
3,268
The numbers in the second query equate when they shouldn't be. Does anyone know what I'm doing wrong? I need the 3rd column to be equal to 303,781 not 348,722.
Thanks!
There's nothing wrong with your second query. Since you're aggregating on the "groupid" field, the output you get tells you that there are no duplicates within the same groupid of the "memberid" values (basically counting values equates to counting distinctively).
On the other hand, in the first query the aggregation happens without any partitioning, whose output hints there are duplicate values across different "groupid" values.
Took the liberty of adding of an example that corroborates your answer:
create table aa (groupid int not null, memberid int not null );
insert into aa (groupid, memberid)
values
(1, 1), (1, 2), (1, 3), (2, 1), (3, 1), (3, 2), (3, 3), (4, 1), (4, 2), (4, 3), (4, 5), (5, 3)
select groupid, count(*), count(distinct memberid)
from aa group by groupid;
select count(*), count(distinct memberid)
from aa

SELECT and COUNT data in a specific range

I would like to check all records for a certain range (1-10) and output the quantity. If there is no record with the value in the database, 0 should also be output.
Example database:
CREATE TABLE exampledata (
ID int,
port int,
name varchar(255));
Example data:
INSERT INTO exampledata
VALUES (1, 1, 'a'), (2, 1, 'b'), (3, 2, 'c'), (4, 2, 'd'), (5, 3, 'e'), (6, 4, 'f'), (7, 8, 'f');
My example query would be:
SELECT
port,
count(port) as amount
FROM exampledata
GROUP BY port
Which would result in:
port
amount
1
2
2
2
3
1
4
1
8
1
But I need it to look like that:
port
amount
1
2
2
2
3
1
4
1
5
0
6
0
7
0
8
1
9
0
10
0
I have thought about a join with a database that has the values 1-10 but this does not seem efficient. Several attempts with different case and if structures were all unsuccessful...
I have prepared the data in a db<>fiddle.
This "simple" answer here would be to use an inline tally. As you just want the values 1-10, this can be achieved with a simple VALUES table construct:
SELECT V.I AS Port,
COUNT(ed.ID) AS Amount
FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10))V(I)
LEFT JOIN dbo.exampledata ed ON V.I = ed.port
GROUP BY V.I;
Presumably, however, you actually have a table of ports, and so what you should be doing is LEFT JOINing from that:
SELECT P.PortID AS Port,
COUNT(ed.ID) AS Amount
FROM dbo.Port P
LEFT JOIN dbo.exampledata ed ON P.PortID = ed.port
WHERE P.PortID BETWEEN 1 AND 10
GROUP BY V.I;
If you don't have a table of ports (why don't you?), and you need to parametrise the values, I suggest using a actual Tally Table or Tally function; a search of these will give you a wealth of resources on how to create these.

Remove duplicates from single field only in rollup query

I have a table of data for individual audits on inventory. Every audit has a location, an expected value, a variance value, and some other data that aren't really important here.
I am writing a query for Cognos 11 which summarizes a week of these audits. Currently, it rolls everything up into sums by location class. My problem is that there may be multiple audits for individual locations and while I want the variance field to sum the data from all audits regardless of whether it's the first count on that location, I only want the expected value for distinct locations (i.e. only SUM expected value where the location is distinct).
Below is a simplified version of the query. Is this even possible or will I have to write a separate query in Cognos and make it two reports that will have to be combined after the fact? As you can likely tell, I'm fairly new to SQL and Cognos.
SELECT COALESCE(CASE
WHEN location_class = 'A'
THEN 'Active'
WHEN location_class = 'C'
THEN 'Active'
WHEN location_class IN (
'R'
,'0'
)
THEN 'Reserve'
END, 'Grand Total') "Row Labels"
,SUM(NVL(expected_cost, 0)) "Sum of Expected Cost"
,SUM(NVL(variance_cost, 0)) "Sum of Variance Cost"
,SUM(ABS(NVL(variance_cost, 0))) "Sum of Absolute Cost"
,COUNT(DISTINCT location) "Count of Locations"
,(SUM(NVL(variance_cost, 0)) / SUM(NVL(expected_cost, 0))) "Variance"
FROM audit_table
WHERE audit_datetime <= #prompt('EndDate') # audit_datetime >= #prompt('StartDate') #
GROUP BY ROLLUP(CASE
WHEN location_class = 'A'
THEN 'Active'
WHEN location_class = 'C'
THEN 'Active'
WHEN location_class IN (
'R'
,'0'
)
THEN 'Reserve'
END)
ORDER BY 1 ASC
This is what I'm hoping to end up with:
Thanks for any help!
Have you tried taking a look at the OVER clause in SQL? It allows you to use windowed functions within a result set such that you can get aggregates based on specific conditions. This would probably help since you seem to trying to get a summation of data based on a different grouping within a larger grouping.
For example, let's say we have the below dataset:
group1 group2 val dateadded
----------- ----------- ----------- -----------------------
1 1 1 2020-11-18
1 1 1 2020-11-20
1 2 10 2020-11-18
1 2 10 2020-11-20
2 3 100 2020-11-18
2 3 100 2020-11-20
2 4 1000 2020-11-18
2 4 1000 2020-11-20
Using a single query we can return both the sums of "val" over "group1" as well as the summation of the first (based on datetime) "val" records in "group2":
declare #table table (group1 int, group2 int, val int, dateadded datetime)
insert into #table values (1, 1, 1, getdate())
insert into #table values (1, 1, 1, dateadd(day, 1, getdate()))
insert into #table values (1, 2, 10, getdate())
insert into #table values (1, 2, 10, dateadd(day, 1, getdate()))
insert into #table values (2, 3, 100, getdate())
insert into #table values (2, 3, 100, dateadd(day, 1, getdate()))
insert into #table values (2, 4, 1000, getdate())
insert into #table values (2, 4, 1000, dateadd(day, 1, getdate()))
select t.group1, sum(t.val) as group1_sum, group2_first_val_sum
from #table t
inner join
(
select group1, sum(group2_first_val) as group2_first_val_sum
from
(
select group1, val as group2_first_val, row_number() over (partition by group2 order by dateadded) as rownumber
from #table
) y
where rownumber = 1
group by group1
) x on t.group1 = x.group1
group by t.group1, x.group2_first_val_sum
This returns the below result set:
group1 group1_sum group2_first_val_sum
----------- ----------- --------------------
1 22 11
2 2200 1100
The most inner subquery in the joined table numbers the rows in the data set based on "group2", resulting in the records either having a "1" or a "2" in the "rownum" column since there's only 2 records in each "group2".
The next subquery takes that data and filters out any rows that are not the first (rownum = 1) and sums the "val" data.
The main query gets the sum of "val" in each "group1" from the main table and then joins on the subqueried table to get the "val" sum of only the first records in each "group2".
There are more efficient ways to write this such as moving the summation of the "group1" values to a subquery in the SELECT statement to get rid of one of the nested tabled subqueries, but I wanted to show how to do it without subqueries in the SELECT statement.
Have you tried to put the distinct at the bottom like this ?
(SUM(NVL(variance_cost,0)) / SUM(NVL(expected_cost,0))) "Variance",
COUNT(DISTINCT location) "Count of Locations"
FROM audit_table

1 distinct row having max value

This is the data I have
I need Unique ID(1 row) with max(Price). So, the output would be:
I have tried the following
select * from table a
join (select b.id,max(b.price) from table b
group by b.id) c on c.id=a.id;
gives the Question as output, because there is no key. I did try the other where condition as well, which gives the original table as output.
You could try something like this in SQL Server:
Table
create table ex1 (
id int,
item char(1),
price int,
qty int,
usr char(2)
);
Data
insert into ex1 values
(1, 'a', 7, 1, 'ab'),
(1, 'a', 7, 2, 'ac'),
(2, 'b', 6, 1, 'ab'),
(2, 'b', 6, 1, 'av'),
(2, 'b', 5, 1, 'ab'),
(3, 'c', 5, 2, 'ab'),
(4, 'd', 4, 2, 'ac'),
(4, 'd', 3, 1, 'av');
Query
select a.* from ex1 a
join (
select id, max(price) as maxprice, min(usr) as minuser
from ex1
group by id
) c
on c.id = a.id
and a.price = c.maxprice
and a.usr = c.minuser
order by a.id, a.usr;
Result
id item price qty usr
1 a 7 1 ab
2 b 6 1 ab
3 c 5 2 ab
4 d 4 2 ac
Explanation
In your dataset, ID 1 has 2 records with the same price. You have to make a decision which one you want. So, in the above example, I am showing a single record for the user whose name is lowest alphabetically.
Alternate method
SQL Server has ranking function row_number over() that can be used as well:
select * from (
select row_number() over( partition by id order by id, price desc, usr) as sr, *
from ex1
) c where sr = 1;
The subquery says - give me all records from the table and give each row a serial number starting with 1 unique to each ID. The rows should be sorted by ID first, then price descending and then usr. The outer query picks out records with sr number 1.
Example here: https://rextester.com/KZCZ25396

Left join with complex join clause

I have two tables and want to left join them.
I want all entries from the account table, but only rows matching a criteria from the right table. If no criteria is matching, I only want the account.
The following does not work as expected:
SELECT * FROM Account a
LEFT JOIN
Entries ef ON ef.account_id = a.account_id AND
(ef.entry_period_end_date BETWEEN $periodStartDate_escaped AND LAST_DAY(date_add( $periodStartDate_escaped, INTERVAL $periodLengthInMonths_escaped MONTH))
OR
ef.forecast_period_end BETWEEN $periodStartDate_escaped AND LAST_DAY(date_add( $periodStartDate_escaped, INTERVAL $periodLengthInMonths_escaped MONTH))
OR
ef.entry_period_end_date IS NULL
OR
ef.forecast_period_end IS NULL
)
cause it also gives me the rows from the entries table, which are outside the requested period.
Example Data:
Account Table
AccountID | AccountName
1 Test
2 Foobar
3 Test1
4 Foobar2
Entries Table
id | AccountID | entry_period_end_date | forecast_period_end | amount
1 1 12/31/2009 12/31/2009 100
2 1 NULL 10/31/2009 150
3 2 NULL NULL 200
4 3 10/31/2009 NULL 250
5 4 10/31/2009 10/31/2009 300
So the query should return (when i set startDate = 12/01/2009, endDate 12/31/2009)
AccountID | id
1 1
2 NULL
3 NULL
4 NULL
Thx,
Martin
If either entry_period_end_date or forecast_period_end is NULL, the row will be returned, even if your other, non-NULL column is not within the period.
Probably you meant this:
SELECT *
FROM Account a
LEFT JOIN
Entries ef
ON ef.account_id = a.account_id
AND
(
entry_period_end_date BETWEEN …
OR forecast_period_end BETWEEN …
)
, which will return you all rows with either entry_period_end or forecast_period_end within the given period.
Update:
A test script:
CREATE TABLE account (AccountID INT NOT NULL, AccountName VARCHAR(100) NOT NULL);
INSERT
INTO account
VALUES
(1, 'Test'),
(2, 'Foobar'),
(3, 'Test1'),
(4, 'Foobar1');
CREATE TABLE Entries (id INT NOT NULL, AccountID INT NOT NULL, entry_period_end_date DATETIME, forecast_period_end DATETIME, amount FLOAT NOT NULL);
INSERT
INTO Entries
VALUES
(1, 1, '2009-12-31', '2009-12-31', 100),
(2, 1, NULL, '2009-10-31', 100),
(3, 2, NULL, NULL, 100),
(4, 3, '2009-10-31', NULL, 100),
(5, 4, '2009-10-31', '2009-10-31', 100);
SELECT a.*, ef.id
FROM Account a
LEFT JOIN
Entries ef
ON ef.accountID = a.accountID
AND
(
entry_period_end_date BETWEEN '2009-12-01' AND '2009-12-31'
OR forecast_period_end BETWEEN '2009-12-01' AND '2009-12-31'
);
returns following:
1, 'Test', 1
2, 'Foobar', NULL
3, 'Test1', NULL
4, 'Foobar1' NULL
Edited to fix logic so end date logic is grouped together, then forecast period logic...
Now it should check for a "good" end date (null or within range), then check for a "good" forecast date (null or within range)
Since all the logic is on the Entries table, narrow it down first, then join
SELECT a.*,temp.id FROM Account a
LEFT JOIN
(
SELECT id, account_id
FROM Entries ef
WHERE
((ef.entry_period_end_date BETWEEN $periodStartDate_escaped AND LAST_DAY(date_add( $periodStartDate_escaped, INTERVAL $periodLengthInMonths_escaped MONTH))
OR
ef.entry_period_end_date IS NULL
)
AND
(ef.forecast_period_end BETWEEN $periodStartDate_escaped AND LAST_DAY(date_add( $periodStartDate_escaped, INTERVAL $periodLengthInMonths_escaped MONTH))
OR
ef.forecast_period_end IS NULL
)
) temp
ON a.account_id = temp.account_id