Summing Results in a Table for Repeated Values - sql

I'm currently in a tricky situation that I have been unable to figure out, and I was hoping you all might be able to help me solve my issue below:
I have a data set that includes a large amount of columns, however I am only going to show the columns pertinent to my issue (and I renamed them and put them in an excel doc).
What I am trying to do is develop a SQL query to calculate the total amount of PASS results and then the amount of FAIL Results for a given House Name. Each Result corresponds with a specific Resident ID and each Resident ID corresponds with a specific House Name/House ID. Unfortunately, the value Room ID needs to be in this data set, and each unique Room ID also corresponds with a specific House Name/House ID. Therefore, for every unique Room ID that exists for a given House Name, the Resident ID is being repeated.
For Example, if there are 7 Room IDs associated with a specific House Name/House ID, each unique Resident ID associated with that specific House Name/House ID will be repeated 7 times, once for every unique Room ID. Therefore, the Results are also all repeated 7 times. I have attached an example of what the data looks like below.
Note: Not all the data is included here. There are a few more rows to the AAAAAA data not shown, and there are a number of other House Names/House IDs.
Any thoughts would be much appreciated!

What you are looking for is GROUP BY.
Without looking at your data it is hard to come up with the exact query but i have created some test data.
create table House (HouseId int, HouseName nvarchar(max));
insert into House (HouseId, HouseName) values (1,'House A'), (2, 'House B'), (3,'House C');
create table Room (RoomId int, RoomName nvarchar(max), HouseId int);
insert into Room (RoomId, RoomName, HouseId)
values
(1,'Room 1 in house A', 1), (2,'Room 2 in house A', 1),
(3,'Room 3 in house B', 2),(4,'Room 4 in house B', 2),
(5,'Room 5 in house C', 3),(6,'Room 6 in house C', 3)
create table Resident (ResidentId int, ResidentName nvarchar(max), RoomId int, Result int);
insert into Resident (ResidentId, ResidentName, RoomId, Result)
values
-- House A = 4 passed, 0 failed
(1, 'Resident 1 in Room 1', 1, 82), (2, 'Resident 2 in Room 1', 1, 76),
(3, 'Resident 3 in Room 2', 2, 91), (4, 'Resident 4 in Room 2', 2, 67),
-- House B = 2 passed, 2 failed
(5, 'Resident 5 in Room 3', 3, 60), (6, 'Resident 6 in Room 3', 3, 64),
(7, 'Resident 7 in Room 4', 4, 28), (8, 'Resident 8 in Room 4', 4, 42),
-- House C = 3 passed, 1 failed
(9, 'Resident 9 in Room 5', 5, 99), (10, 'Resident 10 in Room 5', 5, 57),
(9, 'Resident 11 in Room 6', 6, 75), (10, 'Resident 12 in Room 6', 6, 38)
Then your query would look something like:
select
HouseName,
[Passed] = SUM(x.Passed),
[Failed] = SUM(x.Failed)
from
Resident re
outer apply (
--// Logic to determine if they passed or failed
--// I arbitrarily chose the number 50 to be the threshold
select [Passed] = case when re.Result >= 50 then 1 else 0 end,
[Failed] = case when re.Result < 50 then 1 else 0 end
) x
inner join Room r on r.RoomId = re.RoomId
inner join House h on h.HouseId = r.HouseId
group by
h.HouseName
here is a fiddle: http://sqlfiddle.com/#!18/30894/1

Related

SELECT and COUNT data in a specific range

I would like to check all records for a certain range (1-10) and output the quantity. If there is no record with the value in the database, 0 should also be output.
Example database:
CREATE TABLE exampledata (
ID int,
port int,
name varchar(255));
Example data:
INSERT INTO exampledata
VALUES (1, 1, 'a'), (2, 1, 'b'), (3, 2, 'c'), (4, 2, 'd'), (5, 3, 'e'), (6, 4, 'f'), (7, 8, 'f');
My example query would be:
SELECT
port,
count(port) as amount
FROM exampledata
GROUP BY port
Which would result in:
port
amount
1
2
2
2
3
1
4
1
8
1
But I need it to look like that:
port
amount
1
2
2
2
3
1
4
1
5
0
6
0
7
0
8
1
9
0
10
0
I have thought about a join with a database that has the values 1-10 but this does not seem efficient. Several attempts with different case and if structures were all unsuccessful...
I have prepared the data in a db<>fiddle.
This "simple" answer here would be to use an inline tally. As you just want the values 1-10, this can be achieved with a simple VALUES table construct:
SELECT V.I AS Port,
COUNT(ed.ID) AS Amount
FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10))V(I)
LEFT JOIN dbo.exampledata ed ON V.I = ed.port
GROUP BY V.I;
Presumably, however, you actually have a table of ports, and so what you should be doing is LEFT JOINing from that:
SELECT P.PortID AS Port,
COUNT(ed.ID) AS Amount
FROM dbo.Port P
LEFT JOIN dbo.exampledata ed ON P.PortID = ed.port
WHERE P.PortID BETWEEN 1 AND 10
GROUP BY V.I;
If you don't have a table of ports (why don't you?), and you need to parametrise the values, I suggest using a actual Tally Table or Tally function; a search of these will give you a wealth of resources on how to create these.

sqlite nested query with division

I'm a bit new to SQL and am wondering the best way to do this. basically one query returns the denominator, and the outer query needs to return the numerator/denominator as a percent. the same tables are essentially used for each statement.
create table games(
id integer NOT NULL,
name TEXT NOT NULL,
category TEXT NOT NULL
);
create table game_sets(
id integer NOT NULL,
name TEXT NOT NULL,
theme_id integer NOT NULL
);
INSERT INTO games (id, name, category)
VALUES (1, "star wars", "top game"),
(2, "pokemon", "top game"),
(3, "zelda", "top game"),
(4, "crazy cats", "sucky game");
INSERT INTO game_sets(id, name, theme_id)
VALUES (1, "star wars set 1", 1),
(2, "star wars set 2", 1),
(3, "star wars set 3", 1),
(4, "pikachu set 1", 2),
(5, "narf set 1", 4),
(6, "narf set 2", 4),
(7, "narf set 1", 4),
(8, "narf set 1", 4),
(9, "narf set 1", 4),
(10, "narf set 1", 4);
CREATE VIEW top_games AS
SELECT id, name
FROM games
WHERE category ='top game';
--i hard coded 200 below, but it needs to be dynamic
select top_games.name as theme, printf("%.2f", cast(count(game_sets.name)as float)/200) as num_sets_percent from top_games
join game_sets
where top_games.id = game_sets.theme_id
group by top_games.id
order by num_sets desc
limit 2;
--below here is the number i need for the first query to divide
--i have it hard coded as 4 b/c 4 total sets in the game_sets table, but it needs to be dynamic with this query
(select count(game_sets.name) as num_sets from game_sets
join top_games
where top_games.id = game_sets.theme_id) as divide_by_this
output:
star wars, .3 (because 3 star wars sets out of 10 total sets and star wars is a top game)
pokemon, 0.1 (because 1 pokemon set out of 10 total sets and is also a top set)
and last we limited it to only the 2 top sets so the zelda set doesn't show up.
If you have SQLite 3.25.0+ you can use window functions:
select distinct
g.name,
1.0 * count(g.id) over (partition by g.id) / count() over () num_sets_percent
from game_sets s left join top_games g
on s.theme_id = g.id
order by num_sets_percent desc
limit 2
See the demo.
Results:
| name | num_sets_percent |
| --------- | ---------------- |
| star wars | 0.3 |
| pokemon | 0.1 |

How to specify a linear programming-like constraint (i.e. max number of rows for a dimension's attributes) in SQL server?

I'm looking to assign unique person IDs to a marketing program, but need to optimize based on each person's Probability Score (some people can be sent to multiple programs, some only one) and have two constraints such as budgeted mail quantity for each program.
I'm using SQL Server and am able to put IDs into their highest scoring program using the row_number() over(partition by person_ID order by Prob_Score), but I need to return a table where each ID is assigned to a program, but I'm not sure how to add the max mail quantity constraint specific to each individual program. I've looked into the Check() constraint functionality, but I'm not sure if that's applicable.
create table test_marketing_table(
PersonID int,
MarketingProgram varchar(255),
ProbabilityScore real
);
insert into test_marketing_table (PersonID, MarketingProgram, ProbabilityScore)
values (1, 'A', 0.07)
,(1, 'B', 0.06)
,(1, 'C', 0.02)
,(2, 'A', 0.02)
,(3, 'B', 0.08)
,(3, 'C', 0.13)
,(4, 'C', 0.02)
,(5, 'A', 0.04)
,(6, 'B', 0.045)
,(6, 'C', 0.09);
--this section assigns everyone to their highest scoring program,
--but this isn't necessarily what I need
with x
as
(
select *, row_number()over(partition by PersonID order by ProbabilityScore desc) as PersonScoreRank
from test_marketing_table
)
select *
from x
where PersonScoreRank='1';
I also need to specify some constraints: two max C packages, one max A & one max B package can be sent. How can I reassign the IDs to a program while also using the highest probability score left available?
The final result should look like:
PersonID MarketingProgram ProbabilityScore PersonScoreRank
3 C 0.13 1
6 C 0.09 1
1 A 0.07 1
6 B 0.045 2
You need to rethink your ROW_NUMBER() formula based on your actual need, and you should also have a table of Marketing Programs to make this work efficiently. This covers the basic ideas you need to incorporate to efficiently perform the filtering you need.
MarketingPrograms Table
CREATE TABLE MarketingPrograms (
ProgramID varchar(10),
PeopleDesired int
)
Populate the MarketingPrograms Table
INSERT INTO MarketingPrograms (ProgramID, PeopleDesired) Values
('A', 1),
('B', 1),
('C', 2)
Use the MarketingPrograms Table
with x as (
select *,
row_number()over(partition by ProgramId order by ProbabilityScore desc) as ProgramScoreRank
from test_marketing_table
)
select *
from x
INNER JOIN MarketingPrograms m
ON x.MarketingProgram = m.ProgramID
WHERE x.ProgramScoreRank <= m.PeopleDesired

SQL Server Unique Index across tables

It's possible to create a unique index across tables, basically using a view and a unique index.
I have a problem though.
Given two (or three) tables.
Company
- Id
- Name
Brand
- Id
- CompanyId
- Name
- Code
Product
- Id
- BrandId
- Name
- Code
I want to ensure uniqueness that the combination of:
Company / Brand.Code
and
Company / Brand.Product/Code
are unique.
CREATE VIEW TestView
WITH SCHEMABINDING
AS
SELECT b.CompanyId, b.Code
FROM dbo.Brand b
UNION ALL
SELECT b.CompanyId, p.Code
FROM dbo.Product p
INNER JOIN dbo.Brand b ON p.BrandId = b.BrandId
The creation of the view is successful.
CREATE UNIQUE CLUSTERED INDEX UIX_UniquePrefixCode
ON TestView(CompanyId, Code)
This fails because of the UNION
How can I solve this scenario?
Basically Code for both Brand/Product cannot be duplicated within a company.
Notes:
Error that I get is:
Msg 10116, Level 16, State 1, Line 3 Cannot create index on view
'XXXX.dbo.TestView' because it contains one or more UNION, INTERSECT,
or EXCEPT operators. Consider creating a separate indexed view for
each query that is an input to the UNION, INTERSECT, or EXCEPT
operators of the original view.
Notes 2:
When I'm using the sub query I get the following error:
Msg 10109, Level 16, State 1, Line 3 Cannot create index on view
"XXXX.dbo.TestView" because it references derived table "a"
(defined by SELECT statement in FROM clause). Consider removing the
reference to the derived table or not indexing the view.
**Notes 3: **
So given the Brands:
From #spaghettidba's answer.
INSERT INTO Brand
(
Id,
CompanyId,
Name,
Code
)
VALUES
(1, 1, 'Brand 1', 100 ),
(2, 2, 'Brand 2', 200 ),
(3, 3, 'Brand 3', 300 ),
(4, 1, 'Brand 4', 400 ),
(5, 3, 'Brand 5', 500 )
INSERT INTO Product
(
Id,
BrandId,
Name,
Code
)
VALUES
(1001, 1, 'Product 1001', 1 ),
(1002, 1, 'Product 1002', 2 ),
(1003, 3, 'Product 1003', 3 ),
(1004, 3, 'Product 1004', 301 ),
(1005, 4, 'Product 1005', 5 )
The expectation is, the Brand Code + Company or Product Code + Company is unique, if we expand the results out.
Company / Brand|Product Code
1 / 100 <-- Brand
1 / 400 <-- Brand
1 / 1 <-- Product
1 / 2 <-- Product
1 / 5 <-- Product
2 / 200 <-- Brand
3 / 300 <-- Brand
3 / 500 <-- Brand
3 / 3 <-- Product
3 / 301 <-- Brand
There's no duplicates. If we have a brand and product with the same code.
INSERT INTO Brand
(
Id,
CompanyId,
Name,
Code
)
VALUES
(6, 1, 'Brand 6', 999)
INSERT INTO Product
(
Id,
BrandId,
Name,
Code
)
VALUES
(1006, 2, 'Product 1006', 999)
The product belongs to a different Company, so we get
Company / Brand|Product Code
1 / 999 <-- Brand
2 / 999 <-- Product
This is unique.
But if you have 2 brands, and 1 product.
INSERT INTO Brand
(
Id,
CompanyId,
Name,
Code
)
VALUES
(7, 1, 'Brand 7', 777)
(8, 1, 'Brand 8', 888)
INSERT INTO Product
(
Id,
BrandId,
Name,
Code
)
VALUES
(1007, 8, 'Product 1008', 777)
This would produce
Company / Brand|Product Code
1 / 777 <-- Brand
1 / 888 <-- Brand
1 / 777 <-- Product
This would not be allowed.
Hope that makes sense.
Notes 4:
#spaghettidba's answer solved the cross-table problem, the 2nd issue was duplicates in the Brand table itself.
I've managed to solve this by creating a separate index on the brand table:
CREATE UNIQUE NONCLUSTERED INDEX UIX_UniquePrefixCode23
ON Brand(CompanyId, Code)
WHERE Code IS NOT NULL;
I blogged about a similar solution back in 2011. You can find the post here:
http://spaghettidba.com/2011/08/03/enforcing-complex-constraints-with-indexed-views/
Basically, you have to create a table that contains exactly two rows and you will use that table in CROSS JOIN to duplicate the rows that violate your business rules.
In your case, the indexed view is a bit harder to code because of the way you expressed the business rule. In fact, checking uniqueness on the UNIONed tables through an indexed view is not permitted, as you already have seen.
However, the constraint can be expressed in a different way: since the companyId is implied by the brand, you can avoid the UNION and simply use a JOIN between product and brand and check uniqueness by adding the JOIN predicate on the code itself.
You didn't provide some sample data, I hope you won't mind if I'll do it for you:
CREATE TABLE Company (
Id int PRIMARY KEY,
Name varchar(50)
)
CREATE TABLE Brand (
Id int PRIMARY KEY,
CompanyId int,
Name varchar(50),
Code int
)
CREATE TABLE Product (
Id int PRIMARY KEY,
BrandId int,
Name varchar(50),
Code int
)
GO
INSERT INTO Brand
(
Id,
CompanyId,
Name,
Code
)
VALUES (1, 1, 'Brand 1', 100 ),
(2, 2, 'Brand 2', 200 ),
(3, 3, 'Brand 3', 300 ),
(4, 1, 'Brand 4', 400 ),
(5, 3, 'Brand 5', 500 )
INSERT INTO Product
(
Id,
BrandId,
Name,
Code
)
VALUES
(1001, 1, 'Product 1001', 1 ),
(1002, 1, 'Product 1002', 2 ),
(1003, 3, 'Product 1003', 3 ),
(1004, 3, 'Product 1004', 301 ),
(1005, 4, 'Product 1005', 5 )
As far as I can tell, no rows violating the business rules are present yet.
Now we need the indexed view and the two rows table:
CREATE TABLE tworows (
n int
)
INSERT INTO tworows values (1),(2)
GO
And here's the indexed view:
CREATE VIEW TestView
WITH SCHEMABINDING
AS
SELECT 1 AS one
FROM dbo.Brand b
INNER JOIN dbo.Product p
ON p.BrandId = b.Id
AND p.code = b.code
CROSS JOIN dbo.tworows AS t
GO
CREATE UNIQUE CLUSTERED INDEX IX_TestView ON dbo.TestView(one)
This update should break the business rules:
UPDATE product SET code = 300 WHERE code = 301
In fact you get an error:
Msg 2601, Level 14, State 1, Line 1
Cannot insert duplicate key row in object 'dbo.TestView' with unique index 'IX_TestView'. The duplicate key value is (1).
The statement has been terminated.
Hope this helps.

AVG and COUNT in SQL Server

I have a rating system in which any person may review other. Each person can be judged by one person more than once. For the calculation of averages, I would like to include only the most current values​​.
Is this possible with SQL?
Person 1 rates Person 2 with 5 on 1.2.2011 <- ignored because there is a newer rating of person 1
Person 1 rates Person 2 with 2 on 1.3.2011
Person 2 rates Person 1 with 6 on 1.2.2011 <-- ignored as well
Person 2 rates Person 1 with 3 on 1.3.2011
Person 3 rates Person 1 with 5 on 1.5.2011
Result:
The Average for Person 2 is 2.
The Average for Person 1 is 4.
The table may look like this: evaluator, evaluatee, rating, date.
Kind Regards
Michael
It's perfectly possible.
Let's assume your table structure looks like this:
CREATE TABLE [dbo].[Ratings](
[Evaluator] varchar(10),
[Evaluatee] varchar(10),
[Rating] int,
[Date] datetime
);
and the values like this:
INSERT INTO Ratings
SELECT 'Person 1', 'Person 2', 5, '2011-02-01' UNION
SELECT 'Person 1', 'Person 2', 2, '2011-03-01' UNION
SELECT 'Person 2', 'Person 1', 6, '2011-02-01' UNION
SELECT 'Person 2', 'Person 1', 3, '2011-03-01' UNION
SELECT 'Person 3', 'Person 1', 5, '2011-05-01'
Then the average rating for Person 1 is:
SELECT AVG(Rating) FROM Ratings r1
WHERE Evaluatee='Person 1' and not exists
(SELECT 1 FROM Ratings r2
WHERE r1.Evaluatee = r2.Evaluatee AND
r1.evaluator=r2.evaluator AND
r1.date < r2.date)
Result:
4
Or for all Evaluatee's, grouped by Evaluatee:
SELECT Evaluatee, AVG(Rating) FROM Ratings r1
WHERE not exists
(SELECT 1 FROM Ratings r2
WHERE r1.Evaluatee = r2.Evaluatee AND
r1.evaluator = r2.evaluator AND
r1.date < r2.date)
GROUP BY Evaluatee
Result:
Person 1 4
Person 2 2
This might look like it has an implicit assumption that no entries exist with the same date;
but that's actually not a problem: If such entries can exist, then you can not decide which of these was made later anyway; you could only choose randomly between them. Like shown here, they are both included and averaged - which might be the best solution you can get for that border case (although it slightly favors that person, giving him two votes).
To avoid this problem altogether, you could simply make Date part of the primary key or a unique index - the obvious primary key choice here being the columns (Evaluator, Evaluatee, Date).
declare #T table
(
evaluator int,
evaluatee int,
rating int,
ratedate date
)
insert into #T values
(1, 2, 5, '20110102'),
(1, 2, 2, '20110103'),
(2, 1, 6, '20110102'),
(2, 1, 3, '20110103'),
(3, 1, 5, '20110105')
select evaluatee,
avg(rating) as avgrating
from (
select evaluatee,
rating,
row_number() over(partition by evaluatee, evaluator
order by ratedate desc) as rn
from #T
) as T
where T.rn = 1
group by evaluatee
Result:
evaluatee avgrating
----------- -----------
1 4
2 2
This is possible to do, but it can be REALLY harry - SQL was not designed to compare rows, only columns. I would strongly recommend you keep an additional table containing only the most recent data, and store the rest in an archive table.
If you must do it this way, then I'll need a full table structure to try to write a query for this. In particular I need to know which are the unique indexes.