I have a rating system in which any person may review other. Each person can be judged by one person more than once. For the calculation of averages, I would like to include only the most current values.
Is this possible with SQL?
Person 1 rates Person 2 with 5 on 1.2.2011 <- ignored because there is a newer rating of person 1
Person 1 rates Person 2 with 2 on 1.3.2011
Person 2 rates Person 1 with 6 on 1.2.2011 <-- ignored as well
Person 2 rates Person 1 with 3 on 1.3.2011
Person 3 rates Person 1 with 5 on 1.5.2011
Result:
The Average for Person 2 is 2.
The Average for Person 1 is 4.
The table may look like this: evaluator, evaluatee, rating, date.
Kind Regards
Michael
It's perfectly possible.
Let's assume your table structure looks like this:
CREATE TABLE [dbo].[Ratings](
[Evaluator] varchar(10),
[Evaluatee] varchar(10),
[Rating] int,
[Date] datetime
);
and the values like this:
INSERT INTO Ratings
SELECT 'Person 1', 'Person 2', 5, '2011-02-01' UNION
SELECT 'Person 1', 'Person 2', 2, '2011-03-01' UNION
SELECT 'Person 2', 'Person 1', 6, '2011-02-01' UNION
SELECT 'Person 2', 'Person 1', 3, '2011-03-01' UNION
SELECT 'Person 3', 'Person 1', 5, '2011-05-01'
Then the average rating for Person 1 is:
SELECT AVG(Rating) FROM Ratings r1
WHERE Evaluatee='Person 1' and not exists
(SELECT 1 FROM Ratings r2
WHERE r1.Evaluatee = r2.Evaluatee AND
r1.evaluator=r2.evaluator AND
r1.date < r2.date)
Result:
4
Or for all Evaluatee's, grouped by Evaluatee:
SELECT Evaluatee, AVG(Rating) FROM Ratings r1
WHERE not exists
(SELECT 1 FROM Ratings r2
WHERE r1.Evaluatee = r2.Evaluatee AND
r1.evaluator = r2.evaluator AND
r1.date < r2.date)
GROUP BY Evaluatee
Result:
Person 1 4
Person 2 2
This might look like it has an implicit assumption that no entries exist with the same date;
but that's actually not a problem: If such entries can exist, then you can not decide which of these was made later anyway; you could only choose randomly between them. Like shown here, they are both included and averaged - which might be the best solution you can get for that border case (although it slightly favors that person, giving him two votes).
To avoid this problem altogether, you could simply make Date part of the primary key or a unique index - the obvious primary key choice here being the columns (Evaluator, Evaluatee, Date).
declare #T table
(
evaluator int,
evaluatee int,
rating int,
ratedate date
)
insert into #T values
(1, 2, 5, '20110102'),
(1, 2, 2, '20110103'),
(2, 1, 6, '20110102'),
(2, 1, 3, '20110103'),
(3, 1, 5, '20110105')
select evaluatee,
avg(rating) as avgrating
from (
select evaluatee,
rating,
row_number() over(partition by evaluatee, evaluator
order by ratedate desc) as rn
from #T
) as T
where T.rn = 1
group by evaluatee
Result:
evaluatee avgrating
----------- -----------
1 4
2 2
This is possible to do, but it can be REALLY harry - SQL was not designed to compare rows, only columns. I would strongly recommend you keep an additional table containing only the most recent data, and store the rest in an archive table.
If you must do it this way, then I'll need a full table structure to try to write a query for this. In particular I need to know which are the unique indexes.
Related
I need some help in identifying records which do not have a specific value associated with it.
Need:
Each distinct customer record can have multiple methods of contact, for example:
Cheryl Hubert has the following contact records:
Code value: 1.
Description: home phone
CustomerData:. 123-456-7890
Code value: 2
Description: work phone
CustomerData: 000-123-4567
Code value:3
Description: email
CustomerData: chubert#xxx.xxx
Customers may have none of these, or some of these.
I need to write a query to find all those customer records which DO NOT have an email address (code value 3). I've seen queries with 'not exists' but not sure that would be the right way. Keep in mind that the same field name is used for all contact data (CustomerData).
The code value/description provides what is within the CustomerData field.
Any help appreciated.
Let's say the contact info is in a table contactRecords, which looks something like this:
customerId int,
codeValue int,
description varchar,
customerData varchar
To get all of the customers who do not have an email record (where codeValue = 3), try something like this:
select distinct customerId
from contactRecords
where customerId not in (
select distinct customerId
from contactRecords
where codeValue = 3)
The inner query finds all customers who have an email record. The outer query finds all but those customers.
As you posted almost no data i will try guessing your structure. Assuming you have clients in one table and contacts on another one with the client id, usually when you want to find something non relational between two tables, you select on your client, left join on your contact and put a where clause on any of the contact column is null. If you want specifically the value 3, put it directly in join clause.
Try this query:
select *
from customers c
where not exists(select 1 from contact_method
where customer_id = c.id
and description = 'email');
I assumed such schema:
create table customers(id int, name varchar(20));
insert into customers values (1, 'Cheryl Hubert');
create table contact_method (id int, customer_id int, code_value int, description varchar(20), customer_data varchar(20));
insert into contact_method values (1, 1, 1, 'home phone', '123-456-7890');
insert into contact_method values (2, 1, 2, 'work phone', '000-123-4567');
insert into contact_method values (3, 1, 3, 'email', 'chubert#xxx.xxx');
Demo
You can use the GROUP BY and HAVING clauses to check:
Oracle Setup:
CREATE TABLE contact_details ( code_value, customerid, description, customerdata ) AS
SELECT 1, 1, 'home phone', '123-456-7890' FROM DUAL UNION ALL
SELECT 2, 1, 'work phone', '000-123-4567' FROM DUAL UNION ALL
SELECT 3, 1, 'email', 'chubert#xxx.xxx' FROM DUAL UNION ALL
SELECT 4, 2, 'home phone', '012-345-6789' FROM DUAL;
Query:
SELECT customerid
FROM contact_details
GROUP BY customerid
HAVING COUNT( CASE description WHEN 'email' THEN 1 END ) = 0
Output:
| CUSTOMERID |
|------------|
| 2 |
This question already has answers here:
Querying SQL table with different values in same column with same ID
(2 answers)
Closed 6 years ago.
I have an SQL Server 2012 table with ID, First Name and Last name. The ID is unique per person but due to an error in the historical feed, different people were assigned the same id.
------------------------------
ID FirstName LastName
------------------------------
1 ABC M
1 ABC M
1 ABC M
1 ABC N
2 BCD S
3 CDE T
4 DEF T
4 DEF T
There are two ID's which are present multiple time. 1 and 4. The rows with id 4 are identical. I dont want this in my result. The rows with ID 1, although the first name is same, the last name is different for 1 row. I want only those ID's whose ID is same but one of the first or last names is different.
I tried loading ID's which have multiple occurrences into a temp table and tried to compare it against the parent table albeit unsuccessfully. Any other ideas that I can try and implement?
This is the output I am looking for
ID
---
1
If you want the ids, then use aggregation and having:
select id
from t
group by id
having min(firstname) <> max(firstname) or min(lastname) <> max(lastname);
Try This:
CREATE TABLE #myTable(id INT, firstname VARCHAR(50), lastname VARCHAR(50))
INSERT INTO #myTable VALUES
(1, 'ABC', 'M'),
(1, 'ABC', 'M'),
(1, 'ABC', 'M'),
(1, 'ABC', 'N'),
(2, 'BCD', 'S'),
(3, 'CDE', 'T'),
(4, 'DEF', 'T'),
(4, 'DEF', 'T')
SELECT id FROM (
SELECT DISTINCT id, firstname, lastname
FROM #myTable) t GROUP BY id HAVING COUNT(*)>1
OUTPUT is : 1
A simplified example for illustration: Consider a table "fruit" with 3 columns: name, count and the date purchased. Need an alphabetical list of the fruits and their count the last time they were bought. I am a bit confused by the order of sorting and how distinct is applied. My attempt -
drop table if exists fruit;
create table fruit (
name varchar(8),
count integer,
dateP datetime
);
insert into fruit (name, count, dateP) values
('apple', 4, '2014-03-18 16:24:37'),
('orange', 2, '2013-12-11 11:20:16'),
('apple', 7, '2014-07-05 08:34:21'),
('banana', 6, '2014-06-20 19:10:15'),
('orange', 6, '2014-07-22 17:41:12'),
('banana', 4, '2014-08-15 21:26:37'), -- last
('orange', 5, '2014-12-11 11:20:16'), -- last
('apple', 3, '2014-09-25 18:54:32'), -- last
('apple', 5, '2014-02-05 18:47:18'),
('apple', 12, '2013-09-25 14:18:57'),
('banana', 5, '2013-04-18 15:59:04'),
('apple', 9, '2014-01-29 11:47:45');
-- Expecting:
-- apple 3
-- banana 4
-- orange 5
select distinct name, count
from fruit
group by name
order by name, dateP;
-- Produces:
-- apple 9
-- banana 5
-- orange 5
Try this:-
select f1.name,f1.count
from
fruit f1
inner join
(select name,max(dateP) date_P from fruit group by name) f2
on f1.name = f2.name and f1.dateP = f2.date_P
order by f1.name
EDITED for the last line :)
Try the following:
SELECT fruit.name, fruit.count, fruit.dateP
FROM fruit
INNER JOIN (
SELECT name, Max(dateP) AS lastPurchased
FROM fruit
GROUP BY name
) AS dt ON (dt.name = fruit.name AND dt.lastPurchased = fruit.dateP )
Here is a demo of this example on SQLFiddle.
When faced before with a similar situation I resolved as follows, it requires the use of a primary key, in this case I have added UID.
SELECT a.Name,a.Count FROM Fruit a WHERE a.UID IN
(SELECT b.UID FROM Fruit b
WHERE b.Name = a.Name ORDER BY b.DateP Desc,b.UID DESC LIMIT 1)
This also avoids the possibility that the same fruit was purchased twice at the exact same time; unlikely in this example but in a large scale system it is a possibility which could come back to haunt you. It handles this by ordering by UID as well so it will choose the purchase most recently added to the table (assuming incrementing primary key).
Edited to remove the TOP 1 invalid syntax
In SQLite 3.7.11 or later, you can use MAX/MIN to select from which record in a group other values are returned (but this requires that you have that maximum in the result):
SELECT name, count, MAX(dateP)
FROM fruit
GROUP BY name
ORDER BY name
If you wanna improve your performance, use Common Table Expressions instead of nested Select clauses.
It's possible to create a unique index across tables, basically using a view and a unique index.
I have a problem though.
Given two (or three) tables.
Company
- Id
- Name
Brand
- Id
- CompanyId
- Name
- Code
Product
- Id
- BrandId
- Name
- Code
I want to ensure uniqueness that the combination of:
Company / Brand.Code
and
Company / Brand.Product/Code
are unique.
CREATE VIEW TestView
WITH SCHEMABINDING
AS
SELECT b.CompanyId, b.Code
FROM dbo.Brand b
UNION ALL
SELECT b.CompanyId, p.Code
FROM dbo.Product p
INNER JOIN dbo.Brand b ON p.BrandId = b.BrandId
The creation of the view is successful.
CREATE UNIQUE CLUSTERED INDEX UIX_UniquePrefixCode
ON TestView(CompanyId, Code)
This fails because of the UNION
How can I solve this scenario?
Basically Code for both Brand/Product cannot be duplicated within a company.
Notes:
Error that I get is:
Msg 10116, Level 16, State 1, Line 3 Cannot create index on view
'XXXX.dbo.TestView' because it contains one or more UNION, INTERSECT,
or EXCEPT operators. Consider creating a separate indexed view for
each query that is an input to the UNION, INTERSECT, or EXCEPT
operators of the original view.
Notes 2:
When I'm using the sub query I get the following error:
Msg 10109, Level 16, State 1, Line 3 Cannot create index on view
"XXXX.dbo.TestView" because it references derived table "a"
(defined by SELECT statement in FROM clause). Consider removing the
reference to the derived table or not indexing the view.
**Notes 3: **
So given the Brands:
From #spaghettidba's answer.
INSERT INTO Brand
(
Id,
CompanyId,
Name,
Code
)
VALUES
(1, 1, 'Brand 1', 100 ),
(2, 2, 'Brand 2', 200 ),
(3, 3, 'Brand 3', 300 ),
(4, 1, 'Brand 4', 400 ),
(5, 3, 'Brand 5', 500 )
INSERT INTO Product
(
Id,
BrandId,
Name,
Code
)
VALUES
(1001, 1, 'Product 1001', 1 ),
(1002, 1, 'Product 1002', 2 ),
(1003, 3, 'Product 1003', 3 ),
(1004, 3, 'Product 1004', 301 ),
(1005, 4, 'Product 1005', 5 )
The expectation is, the Brand Code + Company or Product Code + Company is unique, if we expand the results out.
Company / Brand|Product Code
1 / 100 <-- Brand
1 / 400 <-- Brand
1 / 1 <-- Product
1 / 2 <-- Product
1 / 5 <-- Product
2 / 200 <-- Brand
3 / 300 <-- Brand
3 / 500 <-- Brand
3 / 3 <-- Product
3 / 301 <-- Brand
There's no duplicates. If we have a brand and product with the same code.
INSERT INTO Brand
(
Id,
CompanyId,
Name,
Code
)
VALUES
(6, 1, 'Brand 6', 999)
INSERT INTO Product
(
Id,
BrandId,
Name,
Code
)
VALUES
(1006, 2, 'Product 1006', 999)
The product belongs to a different Company, so we get
Company / Brand|Product Code
1 / 999 <-- Brand
2 / 999 <-- Product
This is unique.
But if you have 2 brands, and 1 product.
INSERT INTO Brand
(
Id,
CompanyId,
Name,
Code
)
VALUES
(7, 1, 'Brand 7', 777)
(8, 1, 'Brand 8', 888)
INSERT INTO Product
(
Id,
BrandId,
Name,
Code
)
VALUES
(1007, 8, 'Product 1008', 777)
This would produce
Company / Brand|Product Code
1 / 777 <-- Brand
1 / 888 <-- Brand
1 / 777 <-- Product
This would not be allowed.
Hope that makes sense.
Notes 4:
#spaghettidba's answer solved the cross-table problem, the 2nd issue was duplicates in the Brand table itself.
I've managed to solve this by creating a separate index on the brand table:
CREATE UNIQUE NONCLUSTERED INDEX UIX_UniquePrefixCode23
ON Brand(CompanyId, Code)
WHERE Code IS NOT NULL;
I blogged about a similar solution back in 2011. You can find the post here:
http://spaghettidba.com/2011/08/03/enforcing-complex-constraints-with-indexed-views/
Basically, you have to create a table that contains exactly two rows and you will use that table in CROSS JOIN to duplicate the rows that violate your business rules.
In your case, the indexed view is a bit harder to code because of the way you expressed the business rule. In fact, checking uniqueness on the UNIONed tables through an indexed view is not permitted, as you already have seen.
However, the constraint can be expressed in a different way: since the companyId is implied by the brand, you can avoid the UNION and simply use a JOIN between product and brand and check uniqueness by adding the JOIN predicate on the code itself.
You didn't provide some sample data, I hope you won't mind if I'll do it for you:
CREATE TABLE Company (
Id int PRIMARY KEY,
Name varchar(50)
)
CREATE TABLE Brand (
Id int PRIMARY KEY,
CompanyId int,
Name varchar(50),
Code int
)
CREATE TABLE Product (
Id int PRIMARY KEY,
BrandId int,
Name varchar(50),
Code int
)
GO
INSERT INTO Brand
(
Id,
CompanyId,
Name,
Code
)
VALUES (1, 1, 'Brand 1', 100 ),
(2, 2, 'Brand 2', 200 ),
(3, 3, 'Brand 3', 300 ),
(4, 1, 'Brand 4', 400 ),
(5, 3, 'Brand 5', 500 )
INSERT INTO Product
(
Id,
BrandId,
Name,
Code
)
VALUES
(1001, 1, 'Product 1001', 1 ),
(1002, 1, 'Product 1002', 2 ),
(1003, 3, 'Product 1003', 3 ),
(1004, 3, 'Product 1004', 301 ),
(1005, 4, 'Product 1005', 5 )
As far as I can tell, no rows violating the business rules are present yet.
Now we need the indexed view and the two rows table:
CREATE TABLE tworows (
n int
)
INSERT INTO tworows values (1),(2)
GO
And here's the indexed view:
CREATE VIEW TestView
WITH SCHEMABINDING
AS
SELECT 1 AS one
FROM dbo.Brand b
INNER JOIN dbo.Product p
ON p.BrandId = b.Id
AND p.code = b.code
CROSS JOIN dbo.tworows AS t
GO
CREATE UNIQUE CLUSTERED INDEX IX_TestView ON dbo.TestView(one)
This update should break the business rules:
UPDATE product SET code = 300 WHERE code = 301
In fact you get an error:
Msg 2601, Level 14, State 1, Line 1
Cannot insert duplicate key row in object 'dbo.TestView' with unique index 'IX_TestView'. The duplicate key value is (1).
The statement has been terminated.
Hope this helps.
I am trying to run below 2 queries on the same table and hoping to get results in 2 different columns.
Query 1: select ID as M from table where field = 1
returns:
1
2
3
Query 2: select ID as N from table where field = 2
returns:
4
5
6
My goal is to get
Column1 - Column2
-----------------
1 4
2 5
3 6
Any suggestions? I am using SQL Server 2008 R2
Thanks
There has to be a primary key to foreign key relationship to JOIN data between two tables.
That is the idea about relational algebra and normalization. Otherwise, the correlation of the data is meaningless.
http://en.wikipedia.org/wiki/Database_normalization
The CROSS JOIN will give you all possibilities. (1,4), (1,5), (1, 6) ... (3,6). I do not think that is what you want.
You can always use a ROW_NUMBER() OVER () function to generate a surrogate key in both tables. Order the data the way you want inside the OVER () clause. However, this is still not in any Normal form.
In short. Why do this?
Quick test database. Stores products from sporting goods and home goods using non-normal form.
The results of the SELECT do not mean anything.
-- Just play
use tempdb;
go
-- Drop table
if object_id('abnormal_form') > 0
drop table abnormal_form
go
-- Create table
create table abnormal_form
(
Id int,
Category int,
Name varchar(50)
);
-- Load store products
insert into abnormal_form values
(1, 1, 'Bike'),
(2, 1, 'Bat'),
(3, 1, 'Ball'),
(4, 2, 'Pot'),
(5, 2, 'Pan'),
(6, 2, 'Spoon');
-- Sporting Goods
select * from abnormal_form where Category = 1
-- Home Goods
select * from abnormal_form where Category = 2
-- Does not mean anything to me
select Id1, Id2 from
(select ROW_NUMBER () OVER (ORDER BY ID) AS Rid1, Id as Id1
from abnormal_form where Category = 1) as s
join
(select ROW_NUMBER () OVER (ORDER BY ID) AS Rid2, Id as Id2
from abnormal_form where Category = 2) as h
on s.Rid1 = h.Rid2
We definitely need more information from the user.