It's possible to create a unique index across tables, basically using a view and a unique index.
I have a problem though.
Given two (or three) tables.
Company
- Id
- Name
Brand
- Id
- CompanyId
- Name
- Code
Product
- Id
- BrandId
- Name
- Code
I want to ensure uniqueness that the combination of:
Company / Brand.Code
and
Company / Brand.Product/Code
are unique.
CREATE VIEW TestView
WITH SCHEMABINDING
AS
SELECT b.CompanyId, b.Code
FROM dbo.Brand b
UNION ALL
SELECT b.CompanyId, p.Code
FROM dbo.Product p
INNER JOIN dbo.Brand b ON p.BrandId = b.BrandId
The creation of the view is successful.
CREATE UNIQUE CLUSTERED INDEX UIX_UniquePrefixCode
ON TestView(CompanyId, Code)
This fails because of the UNION
How can I solve this scenario?
Basically Code for both Brand/Product cannot be duplicated within a company.
Notes:
Error that I get is:
Msg 10116, Level 16, State 1, Line 3 Cannot create index on view
'XXXX.dbo.TestView' because it contains one or more UNION, INTERSECT,
or EXCEPT operators. Consider creating a separate indexed view for
each query that is an input to the UNION, INTERSECT, or EXCEPT
operators of the original view.
Notes 2:
When I'm using the sub query I get the following error:
Msg 10109, Level 16, State 1, Line 3 Cannot create index on view
"XXXX.dbo.TestView" because it references derived table "a"
(defined by SELECT statement in FROM clause). Consider removing the
reference to the derived table or not indexing the view.
**Notes 3: **
So given the Brands:
From #spaghettidba's answer.
INSERT INTO Brand
(
Id,
CompanyId,
Name,
Code
)
VALUES
(1, 1, 'Brand 1', 100 ),
(2, 2, 'Brand 2', 200 ),
(3, 3, 'Brand 3', 300 ),
(4, 1, 'Brand 4', 400 ),
(5, 3, 'Brand 5', 500 )
INSERT INTO Product
(
Id,
BrandId,
Name,
Code
)
VALUES
(1001, 1, 'Product 1001', 1 ),
(1002, 1, 'Product 1002', 2 ),
(1003, 3, 'Product 1003', 3 ),
(1004, 3, 'Product 1004', 301 ),
(1005, 4, 'Product 1005', 5 )
The expectation is, the Brand Code + Company or Product Code + Company is unique, if we expand the results out.
Company / Brand|Product Code
1 / 100 <-- Brand
1 / 400 <-- Brand
1 / 1 <-- Product
1 / 2 <-- Product
1 / 5 <-- Product
2 / 200 <-- Brand
3 / 300 <-- Brand
3 / 500 <-- Brand
3 / 3 <-- Product
3 / 301 <-- Brand
There's no duplicates. If we have a brand and product with the same code.
INSERT INTO Brand
(
Id,
CompanyId,
Name,
Code
)
VALUES
(6, 1, 'Brand 6', 999)
INSERT INTO Product
(
Id,
BrandId,
Name,
Code
)
VALUES
(1006, 2, 'Product 1006', 999)
The product belongs to a different Company, so we get
Company / Brand|Product Code
1 / 999 <-- Brand
2 / 999 <-- Product
This is unique.
But if you have 2 brands, and 1 product.
INSERT INTO Brand
(
Id,
CompanyId,
Name,
Code
)
VALUES
(7, 1, 'Brand 7', 777)
(8, 1, 'Brand 8', 888)
INSERT INTO Product
(
Id,
BrandId,
Name,
Code
)
VALUES
(1007, 8, 'Product 1008', 777)
This would produce
Company / Brand|Product Code
1 / 777 <-- Brand
1 / 888 <-- Brand
1 / 777 <-- Product
This would not be allowed.
Hope that makes sense.
Notes 4:
#spaghettidba's answer solved the cross-table problem, the 2nd issue was duplicates in the Brand table itself.
I've managed to solve this by creating a separate index on the brand table:
CREATE UNIQUE NONCLUSTERED INDEX UIX_UniquePrefixCode23
ON Brand(CompanyId, Code)
WHERE Code IS NOT NULL;
I blogged about a similar solution back in 2011. You can find the post here:
http://spaghettidba.com/2011/08/03/enforcing-complex-constraints-with-indexed-views/
Basically, you have to create a table that contains exactly two rows and you will use that table in CROSS JOIN to duplicate the rows that violate your business rules.
In your case, the indexed view is a bit harder to code because of the way you expressed the business rule. In fact, checking uniqueness on the UNIONed tables through an indexed view is not permitted, as you already have seen.
However, the constraint can be expressed in a different way: since the companyId is implied by the brand, you can avoid the UNION and simply use a JOIN between product and brand and check uniqueness by adding the JOIN predicate on the code itself.
You didn't provide some sample data, I hope you won't mind if I'll do it for you:
CREATE TABLE Company (
Id int PRIMARY KEY,
Name varchar(50)
)
CREATE TABLE Brand (
Id int PRIMARY KEY,
CompanyId int,
Name varchar(50),
Code int
)
CREATE TABLE Product (
Id int PRIMARY KEY,
BrandId int,
Name varchar(50),
Code int
)
GO
INSERT INTO Brand
(
Id,
CompanyId,
Name,
Code
)
VALUES (1, 1, 'Brand 1', 100 ),
(2, 2, 'Brand 2', 200 ),
(3, 3, 'Brand 3', 300 ),
(4, 1, 'Brand 4', 400 ),
(5, 3, 'Brand 5', 500 )
INSERT INTO Product
(
Id,
BrandId,
Name,
Code
)
VALUES
(1001, 1, 'Product 1001', 1 ),
(1002, 1, 'Product 1002', 2 ),
(1003, 3, 'Product 1003', 3 ),
(1004, 3, 'Product 1004', 301 ),
(1005, 4, 'Product 1005', 5 )
As far as I can tell, no rows violating the business rules are present yet.
Now we need the indexed view and the two rows table:
CREATE TABLE tworows (
n int
)
INSERT INTO tworows values (1),(2)
GO
And here's the indexed view:
CREATE VIEW TestView
WITH SCHEMABINDING
AS
SELECT 1 AS one
FROM dbo.Brand b
INNER JOIN dbo.Product p
ON p.BrandId = b.Id
AND p.code = b.code
CROSS JOIN dbo.tworows AS t
GO
CREATE UNIQUE CLUSTERED INDEX IX_TestView ON dbo.TestView(one)
This update should break the business rules:
UPDATE product SET code = 300 WHERE code = 301
In fact you get an error:
Msg 2601, Level 14, State 1, Line 1
Cannot insert duplicate key row in object 'dbo.TestView' with unique index 'IX_TestView'. The duplicate key value is (1).
The statement has been terminated.
Hope this helps.
Related
I'm currently in a tricky situation that I have been unable to figure out, and I was hoping you all might be able to help me solve my issue below:
I have a data set that includes a large amount of columns, however I am only going to show the columns pertinent to my issue (and I renamed them and put them in an excel doc).
What I am trying to do is develop a SQL query to calculate the total amount of PASS results and then the amount of FAIL Results for a given House Name. Each Result corresponds with a specific Resident ID and each Resident ID corresponds with a specific House Name/House ID. Unfortunately, the value Room ID needs to be in this data set, and each unique Room ID also corresponds with a specific House Name/House ID. Therefore, for every unique Room ID that exists for a given House Name, the Resident ID is being repeated.
For Example, if there are 7 Room IDs associated with a specific House Name/House ID, each unique Resident ID associated with that specific House Name/House ID will be repeated 7 times, once for every unique Room ID. Therefore, the Results are also all repeated 7 times. I have attached an example of what the data looks like below.
Note: Not all the data is included here. There are a few more rows to the AAAAAA data not shown, and there are a number of other House Names/House IDs.
Any thoughts would be much appreciated!
What you are looking for is GROUP BY.
Without looking at your data it is hard to come up with the exact query but i have created some test data.
create table House (HouseId int, HouseName nvarchar(max));
insert into House (HouseId, HouseName) values (1,'House A'), (2, 'House B'), (3,'House C');
create table Room (RoomId int, RoomName nvarchar(max), HouseId int);
insert into Room (RoomId, RoomName, HouseId)
values
(1,'Room 1 in house A', 1), (2,'Room 2 in house A', 1),
(3,'Room 3 in house B', 2),(4,'Room 4 in house B', 2),
(5,'Room 5 in house C', 3),(6,'Room 6 in house C', 3)
create table Resident (ResidentId int, ResidentName nvarchar(max), RoomId int, Result int);
insert into Resident (ResidentId, ResidentName, RoomId, Result)
values
-- House A = 4 passed, 0 failed
(1, 'Resident 1 in Room 1', 1, 82), (2, 'Resident 2 in Room 1', 1, 76),
(3, 'Resident 3 in Room 2', 2, 91), (4, 'Resident 4 in Room 2', 2, 67),
-- House B = 2 passed, 2 failed
(5, 'Resident 5 in Room 3', 3, 60), (6, 'Resident 6 in Room 3', 3, 64),
(7, 'Resident 7 in Room 4', 4, 28), (8, 'Resident 8 in Room 4', 4, 42),
-- House C = 3 passed, 1 failed
(9, 'Resident 9 in Room 5', 5, 99), (10, 'Resident 10 in Room 5', 5, 57),
(9, 'Resident 11 in Room 6', 6, 75), (10, 'Resident 12 in Room 6', 6, 38)
Then your query would look something like:
select
HouseName,
[Passed] = SUM(x.Passed),
[Failed] = SUM(x.Failed)
from
Resident re
outer apply (
--// Logic to determine if they passed or failed
--// I arbitrarily chose the number 50 to be the threshold
select [Passed] = case when re.Result >= 50 then 1 else 0 end,
[Failed] = case when re.Result < 50 then 1 else 0 end
) x
inner join Room r on r.RoomId = re.RoomId
inner join House h on h.HouseId = r.HouseId
group by
h.HouseName
here is a fiddle: http://sqlfiddle.com/#!18/30894/1
I'm looking to assign unique person IDs to a marketing program, but need to optimize based on each person's Probability Score (some people can be sent to multiple programs, some only one) and have two constraints such as budgeted mail quantity for each program.
I'm using SQL Server and am able to put IDs into their highest scoring program using the row_number() over(partition by person_ID order by Prob_Score), but I need to return a table where each ID is assigned to a program, but I'm not sure how to add the max mail quantity constraint specific to each individual program. I've looked into the Check() constraint functionality, but I'm not sure if that's applicable.
create table test_marketing_table(
PersonID int,
MarketingProgram varchar(255),
ProbabilityScore real
);
insert into test_marketing_table (PersonID, MarketingProgram, ProbabilityScore)
values (1, 'A', 0.07)
,(1, 'B', 0.06)
,(1, 'C', 0.02)
,(2, 'A', 0.02)
,(3, 'B', 0.08)
,(3, 'C', 0.13)
,(4, 'C', 0.02)
,(5, 'A', 0.04)
,(6, 'B', 0.045)
,(6, 'C', 0.09);
--this section assigns everyone to their highest scoring program,
--but this isn't necessarily what I need
with x
as
(
select *, row_number()over(partition by PersonID order by ProbabilityScore desc) as PersonScoreRank
from test_marketing_table
)
select *
from x
where PersonScoreRank='1';
I also need to specify some constraints: two max C packages, one max A & one max B package can be sent. How can I reassign the IDs to a program while also using the highest probability score left available?
The final result should look like:
PersonID MarketingProgram ProbabilityScore PersonScoreRank
3 C 0.13 1
6 C 0.09 1
1 A 0.07 1
6 B 0.045 2
You need to rethink your ROW_NUMBER() formula based on your actual need, and you should also have a table of Marketing Programs to make this work efficiently. This covers the basic ideas you need to incorporate to efficiently perform the filtering you need.
MarketingPrograms Table
CREATE TABLE MarketingPrograms (
ProgramID varchar(10),
PeopleDesired int
)
Populate the MarketingPrograms Table
INSERT INTO MarketingPrograms (ProgramID, PeopleDesired) Values
('A', 1),
('B', 1),
('C', 2)
Use the MarketingPrograms Table
with x as (
select *,
row_number()over(partition by ProgramId order by ProbabilityScore desc) as ProgramScoreRank
from test_marketing_table
)
select *
from x
INNER JOIN MarketingPrograms m
ON x.MarketingProgram = m.ProgramID
WHERE x.ProgramScoreRank <= m.PeopleDesired
I am having trouble writing a script which can delete all the rows which match on the first three columns and where the Quantities sum to zero?
I think the query needs to find all Products that match and then within that group, all the Names which match and then within that subset, all the currencies which match and then, the ones which have quantities netting to zero.
In the below example, the rows which would be deleted would be rows 1&2,4&6.
Product, Name, Currency, Quantity
1) Product A, Name A, GBP, 10
2) Product A, Name A, GBP, -10
3) Product A, Name B, GBP, 10
4) Product A, Name B, USD, 10
5) Product A, Name B, EUR, 10
6) Product A, Name B, USD, -10
7) Product A, Name C, EUR, 10
Hope this makes sense and appreciate any help.
Try this:
DELETE
FROM [Product]
WHERE Id IN
(
SELECT Id
FROM
(
SELECT Id, SUM(Quantity) OVER(PARTITION BY a.Product, a.Name, a.Currency) AS Sm
FROM [Product] a
) a
WHERE Sm = 0
)
You may want to break this problem into parts.
First create a view that lists those combinations which sum to zero
CREATE VIEW vw_foo AS
SELECT product,name, currency, sum(quantity) as net
FROM foo
GROUP BY product, name, currency
HAVING sum(quantity)=0;
At this point, you need to make sure this view has the data you expect to delete. In you example, the view should have only 2 records: ProductA/NameA/GBP and ProductA/NameB/USD
Step 2. Delete the data where the fields match:
DELETE FROM foo
WHERE EXISTS
(SELECT *
FROM vw_foo
WHERE vw_foo.product = product
AND vw_foo.name = name
AND vw_currency = currency);
One way to simplify the SQL is to just concatente the 3 columns into one and apply some grouping:
delete from product
where product + name + currency in (
select product + name + currency
from product
group by product + name + currency
having sum(quantity) = 0)
I am assuming this is a accounting problem with offsetting pairs of entries in the ledger.
If there are for instance three entries for combination (A, A, GBP) this code and some of the example above will not work.
I create a temporary test table, loaded it with your data, used a CTE - common table expression - to find the duplicate pattern and joined it to the table to select the rows.
Just change the 'select *' to 'delete'.
Again, this only works for equal offsetting pairs. It will cause havoc with odd number of entries.
Do you have only even number of entries?
Sincerely
John
-- create sample table
create table #products
(
product_id int identity(1,1),
product_txt varchar(16),
name_txt varchar(16),
currency_cd varchar(16),
quantity_num int
);
go
-- add data 2 table
insert into #products
(product_txt, name_txt, currency_cd, quantity_num)
values
('A', 'A', 'GBP', 10),
('A', 'A', 'GBP', -10),
('A', 'B', 'GBP', 10),
('A', 'B', 'USD', 10),
('A', 'B', 'EUR', 10),
('A', 'B', 'USD', -10),
('A', 'C', 'EUR', 10);
go
-- show the data
select * from #products;
go
-- use cte to find combinations
with cte_Ledger_Offsets (product_txt, name_txt, currency_cd)
as
(
select product_txt, name_txt, currency_cd
from #products
group by product_txt, name_txt, currency_cd
having sum(quantity_num) = 0
)
select * from #products p inner join cte_Ledger_Offsets c
on p.product_txt = c.product_txt and
p.name_txt = c.name_txt and
p.currency_cd = c.currency_cd;
I have 3 tables:
recipe:
id, name
ingredient:
id, name
recipeingredient:
id, recipeId, ingredientId, quantity
Every time, a customer creates a new recipe, I need to check the recipeingredient table to verify if this recipe exists or not. If ingredientId and quantity are exactly the same, I will tell the customer the recipe already exists. Since I need to check multiple rows, need help to write this query.
Knowing your ingredients and quantities, you can do something like this:
select recipeId as ExistingRecipeID
from recipeingredient
where (ingredientId = 1 and quantity = 1)
or (ingredientId = 8 and quantity = 1)
or (ingredientId = 13 and quantity = 1)
group by recipeId
having count(*) = 3 --must match # of ingeredients in WHERE clause
I originally thought that the following query would find pairs of recipes that have exactly the same ingredients:
select ri1.recipeId, ri2.recipeId
from RecipeIngredient ri1 full outer join
RecipeIngredient ri2
on ri1.ingredientId = ri2.ingredientId and
ri1.quantity = ri2.quantity and
ri1.recipeId < ri2.recipeId
group by ri1.recipeId, ri2.recipeId
having count(ri1.id) = count(ri2.id) and -- same number of ingredients
count(ri1.id) = count(*) and -- all r1 ingredients are present
count(*) = count(ri2.id) -- all r2 ingredents are present
However, this query doesn't count things correctly, because the mismatches don't have the right pairs of ids. Alas.
The following does do the correct comparison. It counts the ingredients in each recipe before the join, so this value can just be compared on all matching rows.
select ri1.recipeId, ri2.recipeId
from (select ri.*, COUNT(*) over (partition by recipeid) as numingredients
from #RecipeIngredient ri
) ri1 full outer join
(select ri.*, COUNT(*) over (partition by recipeid) as numingredients
from #RecipeIngredient ri
) ri2
on ri1.ingredientId = ri2.ingredientId and
ri1.quantity = ri2.quantity and
ri1.recipeId < ri2.recipeId
group by ri1.recipeId, ri2.recipeId
having max(ri1.numingredients) = max(ri2.numingredients) and
max(ri1.numingredients) = count(*)
The having clause guarantees that each recipe that the same number of ingredients, and that the number of matching ingredients is the total. This time, I've tested it on the following data:
insert into #recipeingredient select 1, 1, 1
insert into #recipeingredient select 1, 2, 10
insert into #recipeingredient select 2, 1, 1
insert into #recipeingredient select 2, 2, 10
insert into #recipeingredient select 2, 3, 10
insert into #recipeingredient select 3, 1, 1
insert into #recipeingredient select 4, 1, 1
insert into #recipeingredient select 4, 3, 10
insert into #recipeingredient select 5, 1, 1
insert into #recipeingredient select 5, 2, 10
If you have a new recipe, you can modify this query to just look for the recipe in one of the tables (say ri1) using an additional condition on the on clause.
If you place the ingredients in a temporary table, you can substitute one of these tables, say ri1, with the new table.
You might try something like this to find if you have a duplicate:
-- Setup test data
declare #recipeingredient table (
id int not null primary key identity
, recipeId int not null
, ingredientId int not null
, quantity int not null
)
insert into #recipeingredient select 1, 1, 1
insert into #recipeingredient select 1, 2, 10
insert into #recipeingredient select 2, 1, 1
insert into #recipeingredient select 2, 2, 10
-- Actual Query
if exists (
select *
from #recipeingredient old
full outer join #recipeingredient new
on old.recipeId != new.recipeId -- Different recipes
and old.ingredientId = new.ingredientId -- but same ingredients
and old.quantity = new.quantity -- and same quantities
where old.id is null -- Match not found
or new.id is null -- Match not found
)
begin
select cast(0 as bit) as IsDuplicateRecipe
end
else begin
select cast(1 as bit) as IsDuplicateRecipe
end
Since this is really only searching for a duplicate, you might want to substitute a temp table or pass a table variable for the "new" table. This way you wouldn't have to insert the new records before doing your search. You could also insert into the base tables, wrap the whole thing in a transaction and rollback based upon the results.
I have a rating system in which any person may review other. Each person can be judged by one person more than once. For the calculation of averages, I would like to include only the most current values.
Is this possible with SQL?
Person 1 rates Person 2 with 5 on 1.2.2011 <- ignored because there is a newer rating of person 1
Person 1 rates Person 2 with 2 on 1.3.2011
Person 2 rates Person 1 with 6 on 1.2.2011 <-- ignored as well
Person 2 rates Person 1 with 3 on 1.3.2011
Person 3 rates Person 1 with 5 on 1.5.2011
Result:
The Average for Person 2 is 2.
The Average for Person 1 is 4.
The table may look like this: evaluator, evaluatee, rating, date.
Kind Regards
Michael
It's perfectly possible.
Let's assume your table structure looks like this:
CREATE TABLE [dbo].[Ratings](
[Evaluator] varchar(10),
[Evaluatee] varchar(10),
[Rating] int,
[Date] datetime
);
and the values like this:
INSERT INTO Ratings
SELECT 'Person 1', 'Person 2', 5, '2011-02-01' UNION
SELECT 'Person 1', 'Person 2', 2, '2011-03-01' UNION
SELECT 'Person 2', 'Person 1', 6, '2011-02-01' UNION
SELECT 'Person 2', 'Person 1', 3, '2011-03-01' UNION
SELECT 'Person 3', 'Person 1', 5, '2011-05-01'
Then the average rating for Person 1 is:
SELECT AVG(Rating) FROM Ratings r1
WHERE Evaluatee='Person 1' and not exists
(SELECT 1 FROM Ratings r2
WHERE r1.Evaluatee = r2.Evaluatee AND
r1.evaluator=r2.evaluator AND
r1.date < r2.date)
Result:
4
Or for all Evaluatee's, grouped by Evaluatee:
SELECT Evaluatee, AVG(Rating) FROM Ratings r1
WHERE not exists
(SELECT 1 FROM Ratings r2
WHERE r1.Evaluatee = r2.Evaluatee AND
r1.evaluator = r2.evaluator AND
r1.date < r2.date)
GROUP BY Evaluatee
Result:
Person 1 4
Person 2 2
This might look like it has an implicit assumption that no entries exist with the same date;
but that's actually not a problem: If such entries can exist, then you can not decide which of these was made later anyway; you could only choose randomly between them. Like shown here, they are both included and averaged - which might be the best solution you can get for that border case (although it slightly favors that person, giving him two votes).
To avoid this problem altogether, you could simply make Date part of the primary key or a unique index - the obvious primary key choice here being the columns (Evaluator, Evaluatee, Date).
declare #T table
(
evaluator int,
evaluatee int,
rating int,
ratedate date
)
insert into #T values
(1, 2, 5, '20110102'),
(1, 2, 2, '20110103'),
(2, 1, 6, '20110102'),
(2, 1, 3, '20110103'),
(3, 1, 5, '20110105')
select evaluatee,
avg(rating) as avgrating
from (
select evaluatee,
rating,
row_number() over(partition by evaluatee, evaluator
order by ratedate desc) as rn
from #T
) as T
where T.rn = 1
group by evaluatee
Result:
evaluatee avgrating
----------- -----------
1 4
2 2
This is possible to do, but it can be REALLY harry - SQL was not designed to compare rows, only columns. I would strongly recommend you keep an additional table containing only the most recent data, and store the rest in an archive table.
If you must do it this way, then I'll need a full table structure to try to write a query for this. In particular I need to know which are the unique indexes.