Combining select distinct with group and ordering - sql

A simplified example for illustration: Consider a table "fruit" with 3 columns: name, count and the date purchased. Need an alphabetical list of the fruits and their count the last time they were bought. I am a bit confused by the order of sorting and how distinct is applied. My attempt -
drop table if exists fruit;
create table fruit (
name varchar(8),
count integer,
dateP datetime
);
insert into fruit (name, count, dateP) values
('apple', 4, '2014-03-18 16:24:37'),
('orange', 2, '2013-12-11 11:20:16'),
('apple', 7, '2014-07-05 08:34:21'),
('banana', 6, '2014-06-20 19:10:15'),
('orange', 6, '2014-07-22 17:41:12'),
('banana', 4, '2014-08-15 21:26:37'), -- last
('orange', 5, '2014-12-11 11:20:16'), -- last
('apple', 3, '2014-09-25 18:54:32'), -- last
('apple', 5, '2014-02-05 18:47:18'),
('apple', 12, '2013-09-25 14:18:57'),
('banana', 5, '2013-04-18 15:59:04'),
('apple', 9, '2014-01-29 11:47:45');
-- Expecting:
-- apple 3
-- banana 4
-- orange 5
select distinct name, count
from fruit
group by name
order by name, dateP;
-- Produces:
-- apple 9
-- banana 5
-- orange 5

Try this:-
select f1.name,f1.count
from
fruit f1
inner join
(select name,max(dateP) date_P from fruit group by name) f2
on f1.name = f2.name and f1.dateP = f2.date_P
order by f1.name
EDITED for the last line :)

Try the following:
SELECT fruit.name, fruit.count, fruit.dateP
FROM fruit
INNER JOIN (
SELECT name, Max(dateP) AS lastPurchased
FROM fruit
GROUP BY name
) AS dt ON (dt.name = fruit.name AND dt.lastPurchased = fruit.dateP )
Here is a demo of this example on SQLFiddle.

When faced before with a similar situation I resolved as follows, it requires the use of a primary key, in this case I have added UID.
SELECT a.Name,a.Count FROM Fruit a WHERE a.UID IN
(SELECT b.UID FROM Fruit b
WHERE b.Name = a.Name ORDER BY b.DateP Desc,b.UID DESC LIMIT 1)
This also avoids the possibility that the same fruit was purchased twice at the exact same time; unlikely in this example but in a large scale system it is a possibility which could come back to haunt you. It handles this by ordering by UID as well so it will choose the purchase most recently added to the table (assuming incrementing primary key).
Edited to remove the TOP 1 invalid syntax

In SQLite 3.7.11 or later, you can use MAX/MIN to select from which record in a group other values are returned (but this requires that you have that maximum in the result):
SELECT name, count, MAX(dateP)
FROM fruit
GROUP BY name
ORDER BY name

If you wanna improve your performance, use Common Table Expressions instead of nested Select clauses.

Related

SQL for selecting values in a single column by 'AND' condition

I have a table data like bellow
PersonId
Eat
111
Carrot
111
Apple
111
Orange
222
Carrot
222
Apple
333
Carrot
444
Orange
555
Apple
I need an sql query which return the total number of PersonId's who eat both Carrot and Apple.
In the above example the result is, Result : 2. (PersonId's 111 and 222)
An ms-sql query like 'select count(distinct PersonId) from Person where Eat = 'Carrot' and Eat = 'Apple''
You can actually get the count without using a subquery to determine the persons who eat both. Assuming that the rows are unique:
select ( count(distinct case when eat = 'carrot' then personid end) +
count(distinct case when eat = 'apple' then personid end) -
count(distinct personid)
) as num_both
from t
where eat in ('carrot', 'apple')
SELECT PersonID FROM Person WHERE Eat = 'Carrot'
INTERSECT
SELECT PersonID FROM Person WHERE Eat = 'Apple'
You can use conditional aggregation of a sort:
select
personid
from <yourtable>
group by
personid
having
count (case when eat = 'carrot' then 1 else null end) >= 1
and count (case when eat = 'apple' then 1 else null end) >= 1
At this example, I use STRING_AGG to make easy the count and transform 'Apple' and 'Carrot' to one string comparison:
create table #EatTemp
(
PersonId int,
Eat Varchar(50)
)
INSERT INTO #EatTemp VALUES
(111, 'Carrot')
,(111, 'Apple')
,(111, 'Orange')
,(222, 'Carrot')
,(222, 'Apple')
,(333, 'Carrot')
,(444, 'Orange')
,(555, 'Apple')
SELECT Count(PersonId) WhoEatCarrotAndApple FROM
(
SELECT PersonId,
STRING_AGG(Eat, ';')
WITHIN GROUP (ORDER BY Eat) Eat
FROM #EatTemp
WHERE Eat IN ('Apple', 'Carrot')
GROUP BY PersonId
) EatAgg
WHERE Eat = 'Apple;Carrot'
You can use EXISTS statements to achieve your goal. Below is a full set of code you can use to test the results. In this case, this returns a count of 2 since PersonId 111 and 222 match the criteria you specified in your post.
CREATE TABLE Person
( PersonId INT
, Eat VARCHAR(10));
INSERT INTO Person
VALUES
(111, 'Carrot'), (111, 'Apple'), (111, 'Orange'),
(222, 'Carrot'), (222, 'Apple'), (333, 'Carrot'),
(444, 'Orange'), (555, 'Apple');
SELECT COUNT(DISTINCT PersonId)
FROM Person AS p
WHERE EXISTS
(SELECT 1
FROM Person e1
WHERE e1.Eat = 'Apple'
AND p.PersonId = e1.PersonId)
AND EXISTS
(SELECT 1
FROM Person e1
WHERE e1.Eat = 'Carrot'
AND p.PersonId = e1.PersonId);
EXISTS statements have a few advantages:
No chance of changing the granularity of your data since you aren't joining in your FROM clause.
Easy to add additional conditions as needed. Just add more EXISTS statements in your WHERE clause.
The condition is cleanly encapsulated in the EXISTS, so code intent is clear.
If you ever need complex conditions like existence of a value in another table based on specific filter conditions, then you can easily add this without introducing table joins in your main query.
Some alternative solutions such as PersonId IN (SUBQUERY) can introduce unexpected behavior in certain conditions, particularly when the subquery returns a NULL value.
select
count(PersonID)
from Person
where eat = 'Carrot'
and PersonID in (select PersonID
from Person
where eat = 'Apple');
Only selecting those persons who eat apples, and from that result select all those that eat carrots too.
SELECT COUNT (A.personID) FROM
(SELECT distinct PersonID FROM Person WHERE Eat = 'Carrot'
INTERSECT
SELECT distinct PersonID FROM Person WHERE Eat = 'Apple') as A

Display SUM of 2 listed/GROUP BY values using WHERE condition

I want to add the values of two columns displayed and display as 1 column name.
This is the output I'm getting,
ID Total
Apple 10
RawApple 10
Mango 10
RawMango 10
I want the output as
ID Total
Apples 20
Mangoes 20
If the issue is removing the first three characters -- if they are "Raw" -- then you can do:
select (case when id like 'Raw%' then stuff(id, 1, 3, '') else id end) as id,
sum(total)
from t
group by (case when id like 'Raw%' then stuff(id, 1, 3, '') else id end);
If you want to replace specific values with other values, I would suggest an in-query lookup table:
select coalesce(v.new_id, t.id) as id, sum(total)
from t left join
(values ('RawApple', 'Apple'),
('RawMango', 'Mango')
) v(id, new_id)
on t.id = v.id
group by coalesce(v.new_id, t.id);
If we can assume that the name of the fruit is after the prefix, and the prefix ends with a hyphen (-), then we can use STUFF to remove the prefix and then aggregate:
WITH VTE AS(
SELECT *
FROM (VALUES('Apple',10),
('Raw-Apple',10),
('Mango',10),
('Raw-Mango',10))V(ID,Total))
SELECT S.ID,
SUM(V.Total) AS Total
FROM VTE V
CROSS APPLY(VALUES(STUFF(V.ID,1,CHARINDEX('-',V.ID),'')))S(ID)
GROUP BY S.ID;
Note I don't change the names of the fruits to the plural, as depending on the fruit changes what the plural is. You'll need a dictionary table to store what the plural of the fruit is and then `JOIN to that. So a table that looks like this:
CREATE TABLE dbo.FruitPlural (Fruit varchar(20), Plural varchar(20));
INSERT INTO dbo.FruitPlural
VALUES ('Apple','Apples'),
('Mango','Mangoes'),
('Strawberry','Strawberries'),
...;
Note, this answer was invalidated due to the OP moving the goal posts due to the sample data not being representative of their actual data, however, I am leaving here as it may help future users.

MSSQL ORDER BY Passed List

I am using Lucene to perform queries on a subset of SQL data which returns me a scored list of RecordIDs, e.g. 11,4,5,25,30 .
I want to use this list to retrieve a set of results from the full SQL Table by RecordIDs.
So SELECT * FROM MyFullRecord
where RecordID in (11,5,3,25,30)
I would like the retrieved list to maintain the scored order.
I can do it by using an Order by like so;
ORDER BY (CASE WHEN RecordID = 11 THEN 0
WHEN RecordID = 5 THEN 1
WHEN RecordID = 3 THEN 2
WHEN RecordID = 25 THEN 3
WHEN RecordID = 30 THEN 4
END)
I am concerned with the loading of the server loading especially if I am passing long lists of RecordIDs. Does anyone have experience of this or how can I determine an optimum list length.
Are there any other ways to achieve this functionality in MSSQL?
Roger
You can record your list into a table or table variable with sorting priorities.
And then join your table with this sorting one.
DECLARE TABLE #tSortOrder (RecordID INT, SortOrder INT)
INSERT INTO #tSortOrder (RecordID, SortOrder)
SELECT 11, 1 UNION ALL
SELECT 5, 2 UNION ALL
SELECT 3, 3 UNION ALL
SELECT 25, 4 UNION ALL
SELECT 30, 5
SELECT *
FROM yourTable T
LEFT JOIN #tSortOrder S ON T.RecordID = S.RecordID
ORDER BY S.SortOrder
Instead of creating a searched order by statement, you could create an in memory table to join. It's easier on the eyes and definitely scales better.
SQL Statement
SELECT mfr.*
FROM MyFullRecord mfr
INNER JOIN (
SELECT *
FROM (VALUES (1, 11),
(2, 5),
(3, 3),
(4, 25),
(5, 30)
) q(ID, RecordID)
) q ON q.RecordID = mfr.RecordID
ORDER BY
q.ID
Look here for a fiddle
Something like:
SELECT * FROM MyFullRecord where RecordID in (11,5,3,25,30)
ORDER BY
CHARINDEX(','+CAST(RecordID AS varchar)+',',
','+'11,5,3,25,30'+',')
SQLFiddle demo

AVG and COUNT in SQL Server

I have a rating system in which any person may review other. Each person can be judged by one person more than once. For the calculation of averages, I would like to include only the most current values​​.
Is this possible with SQL?
Person 1 rates Person 2 with 5 on 1.2.2011 <- ignored because there is a newer rating of person 1
Person 1 rates Person 2 with 2 on 1.3.2011
Person 2 rates Person 1 with 6 on 1.2.2011 <-- ignored as well
Person 2 rates Person 1 with 3 on 1.3.2011
Person 3 rates Person 1 with 5 on 1.5.2011
Result:
The Average for Person 2 is 2.
The Average for Person 1 is 4.
The table may look like this: evaluator, evaluatee, rating, date.
Kind Regards
Michael
It's perfectly possible.
Let's assume your table structure looks like this:
CREATE TABLE [dbo].[Ratings](
[Evaluator] varchar(10),
[Evaluatee] varchar(10),
[Rating] int,
[Date] datetime
);
and the values like this:
INSERT INTO Ratings
SELECT 'Person 1', 'Person 2', 5, '2011-02-01' UNION
SELECT 'Person 1', 'Person 2', 2, '2011-03-01' UNION
SELECT 'Person 2', 'Person 1', 6, '2011-02-01' UNION
SELECT 'Person 2', 'Person 1', 3, '2011-03-01' UNION
SELECT 'Person 3', 'Person 1', 5, '2011-05-01'
Then the average rating for Person 1 is:
SELECT AVG(Rating) FROM Ratings r1
WHERE Evaluatee='Person 1' and not exists
(SELECT 1 FROM Ratings r2
WHERE r1.Evaluatee = r2.Evaluatee AND
r1.evaluator=r2.evaluator AND
r1.date < r2.date)
Result:
4
Or for all Evaluatee's, grouped by Evaluatee:
SELECT Evaluatee, AVG(Rating) FROM Ratings r1
WHERE not exists
(SELECT 1 FROM Ratings r2
WHERE r1.Evaluatee = r2.Evaluatee AND
r1.evaluator = r2.evaluator AND
r1.date < r2.date)
GROUP BY Evaluatee
Result:
Person 1 4
Person 2 2
This might look like it has an implicit assumption that no entries exist with the same date;
but that's actually not a problem: If such entries can exist, then you can not decide which of these was made later anyway; you could only choose randomly between them. Like shown here, they are both included and averaged - which might be the best solution you can get for that border case (although it slightly favors that person, giving him two votes).
To avoid this problem altogether, you could simply make Date part of the primary key or a unique index - the obvious primary key choice here being the columns (Evaluator, Evaluatee, Date).
declare #T table
(
evaluator int,
evaluatee int,
rating int,
ratedate date
)
insert into #T values
(1, 2, 5, '20110102'),
(1, 2, 2, '20110103'),
(2, 1, 6, '20110102'),
(2, 1, 3, '20110103'),
(3, 1, 5, '20110105')
select evaluatee,
avg(rating) as avgrating
from (
select evaluatee,
rating,
row_number() over(partition by evaluatee, evaluator
order by ratedate desc) as rn
from #T
) as T
where T.rn = 1
group by evaluatee
Result:
evaluatee avgrating
----------- -----------
1 4
2 2
This is possible to do, but it can be REALLY harry - SQL was not designed to compare rows, only columns. I would strongly recommend you keep an additional table containing only the most recent data, and store the rest in an archive table.
If you must do it this way, then I'll need a full table structure to try to write a query for this. In particular I need to know which are the unique indexes.

Select distinct rows based on some, but not all columns

I originally ran into this problem while working on SQL queries that select certain aggregate values (min, max etc) from grouped results. For example, select the cheapest fruit, its variety and the price, off each fruit group. The common solution is to first group the fruits along with the cheapest price using MIN, then self join it to get the other column ("variety" in this case).
Now say if we have more than one variety of a fruit with the same price, and that price happened to be the lowest price. So we end up getting results like this:
Apple Fuji 5.00
Apple Green 5.00
Orange valencia 3.00
Pear bradford 6.00
How do I make it so that only one kind of apple shows up in the final result? It can be any one of the varieties, be it the record that shows up the first, last or random.
So basically I need to eliminate rows based on two of the three columns being equal, and it doesn't matter which rows get eliminated as long as there is one left in the final result set.
Any help would be appreciated.
Try this... I added more fruits. The way to read it is to start from the inner most From clause and work your way out.
create table fruit (
FruitName varchar(50) not null,
FruitVariety varchar(50) not null,
Price decimal(10,2) not null
)
insert into fruit (FruitName, FruitVariety, Price)
values ('Apple','Fuji',5.00)
insert into fruit (FruitName, FruitVariety, Price)
values ('Apple','Green',5.00)
insert into fruit (FruitName, FruitVariety, Price)
values ('Orange','Valencia',3.00)
insert into fruit (FruitName, FruitVariety, Price)
values ('Orange','Navel',5.00)
insert into fruit (FruitName, FruitVariety, Price)
values ('Pear','Bradford',6.00)
insert into fruit (FruitName, FruitVariety, Price)
values ('Pear','Nashi',8.00)
select
rankedCheapFruits.FruitName,
rankedCheapFruits.FruitVariety,
rankedCheapFruits.Price
from (
select
f.FruitName,
f.FruitVariety,
f.Price,
row_number() over(
partition by f.FruitName
order by f.FruitName, f.FruitVariety
) as FruitRank
from (
select
f.FruitName,
min(f.Price) as LowestPrice
from Fruit f
group by
f.FruitName
) as cheapFruits
join Fruit f on cheapFruits.FruitName = f.FruitName
and f.Price = cheapFruits.LowestPrice
) rankedCheapFruits
where rankedCheapFruits.FruitRank = 1
You could use a MIN operator, that would limit it to the first row
One option is to rank the rows based on some criteria (alphabetical order of fruit variety) and then pick the minimum of the rank.
There is a rank() function in ms-sql for exactly this purpose.