Selecting specific distinct column in SQL - sql

I am trying to create a select statement so that it does a specific distinct on one column. I am trying to make it so that there is not multiple fruits within each id. If there is multiple fruits under an id, I would like use only 1 approved fruit, over the rotten fruit. If there is only 1 fruit under that id, use it.
SELECT id, fruit, fruitweight, status
FROM myfruits
Raw data from current select
id | fruit | fruitweight | status
1 | apple | .2 | approved
1 | apple | .8 | approved
1 | apple | .1 | rotten
1 | orange | .5 | approved
2 | grape | .1 | rotten
2 | orange | .7 | approved
2 | orange | .5 | approved
How it should be formatted after constraint
id | fruit | fruitweight | status
1 | apple | .2 | approved
1 | orange | .5 | approved
2 | grape | .1 | rotten
2 | orange | .7 | approved
I can do something along the lines of select distinct id,fruit,fruitweight,status from myfruits,
but that will only take out the duplicates if all columns are the same.

CTE with aggregate and row_number.
declare #YourTable table (id int, fruit varchar(64), fruitweight decimal(2,1),status varchar(64))
insert into #YourTable
values
(1,'apple',0.2,'approved'),
(1,'apple',0.8,'approved'),
(1,'apple',0.1,'rotten'),
(1,'orange',0.5,'approved'),
(2,'grape',0.1,'rotten'),
(2,'orange',0.7,'approved'),
(2,'orange',0.5,'approved')
;with cte as(
select
id
,fruit
,fruitweight = min(fruitweight)
,[status]
,RN = row_number() over (partition by id, fruit order by case when status = 'approved' then 1 else 2 end)
from
#YourTable
group by
id,fruit,status)
select
id
,fruit
,fruitweight
,status
from
cte
where RN = 1
Another method, without the aggregate... assuming you want the first fruightweight
;with cte as(
select
id
,fruit
,fruitweight
,[status]
,RN = row_number() over (partition by id, fruit order by case when status = 'approved' then 1 else 2 end, fruitweight)
from
#YourTable)
select
id
,fruit
,fruitweight
,status
from
cte
where RN = 1

Another option is using the WITH TIES clause.
Example
Select top 1 with ties *
From YourTable
Order By Row_Number() over (Partition By id,fruit order by status,fruitweight)

A shorter version of scsimon's solution without aggregates.
If you have SQL Server < 2012, you'll have to use case instead of iif.
select
id
,fruit
,fruitweight
,status
from
(
select
id
,fruit
,fruitweight
,status
,rownum = row_number() over(partition by id, fruit order by iif(status = 'approved', 0, 1), fruitweight desc)
from myfruits
) x
where rownum = 1
EDIT: I started writing before scsimon edited his post to included a version without aggregates...

Related

Select row A if a condition satisfies else select row B for each group

We have 2 tables, bookings and docs
bookings
booking_id | name
100 | "Val1"
101 | "Val5"
102 | "Val6"
docs
doc_id | booking_id | doc_type_id
6 | 100 | 1
7 | 100 | 2
8 | 101 | 1
9 | 101 | 2
10 | 101 | 2
We need the result like this:
booking_id | doc_id
100 | 7
101 | 10
Essentially, we are trying to get the latest record of doc per booking, but if doc_type_id 2 is present, select the latest record of doc type 2 else select latest record of doc_type_id 1.
Is this possible to achieve with a performance friendly query as we need to apply this in a very huge query?
You can do it with FIRST_VALUE() window function by sorting properly the rows for each booking_id so that the rows with doc_type_id = 2 are returned first:
SELECT DISTINCT booking_id,
FIRST_VALUE(doc_id) OVER (PARTITION BY booking_id ORDER BY doc_type_id = 2 DESC, doc_id DESC) rn
FROM docs;
If you want full rows returned then you could use ROW_NUMBER() window function:
SELECT booking_id, doc_id, doc_type_id
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY booking_id ORDER BY doc_type_id = 2 DESC, doc_id DESC) rn
FROM docs
) t
WHERE rn = 1;

A running summary of totals in SQL Server

Come up against an issue where I want to summarize results in a query.
Example as follows:
NAME | FRUIT | PRICE
-----+-------+------
JOHN | APPLE | 2
JOHN | APPLE | 2
JOHN | APPLE | 2
JOHN | APPLE | 2
DAVE | GRAPE | 3
DAVE | GRAPE | 3
DAVE | GRAPE | 3
This is my table at the moment, what i need though is to have a summary of Johns business, like below:
NAME | FRUIT | PRICE
-----+-------+------
JOHN | APPLE | 2
JOHN | APPLE | 2
JOHN | APPLE | 2
JOHN | APPLE | 2
JOHN | TOTAL | 8
DAVE | GRAPE | 3
DAVE | GRAPE | 3
DAVE | GRAPE | 3
DAVE | TOTAL | 9
I have tried to group the information but it does not reflect what i want, plus if John were to have different fruit it would need to sum that up before it sums up the next part and it needs to have a running total for all values in the NAME field as there will be a number of customers.
Any advice would be great
EDIT
I have tried using Rollup but I keep getting totals of all values in a seperate column where as I would like to see it as the way it is formatted above
A solution with UNION and GROUP BY.
;WITH PricesWithTotals AS
(
SELECT
Name,
Fruit,
Price
FROM
YourTable
UNION ALL
SELECT
Name,
Fruit = 'TOTAL',
Price = SUM(Price)
FROM
YourTable
GROUP BY
Name
)
SELECT
Name,
Fruit,
Price
FROM
PricesWithTotals
ORDER BY
Name,
CASE WHEN Fruit <> 'Total' THEN 1 ELSE 999 END ASC,
Fruit
This will get you a running total per customer per fruit:
create table #Sales([Name] varchar(20), Fruit varchar(20), Price int)
insert into #Sales([Name], Fruit, Price)
values
('JOHN','APPLE',2),
('JOHN','APPLE',2),
('JOHN','APPLE',2),
('JOHN','APPLE',2),
('DAVE','GRAPE',3),
('DAVE','GRAPE',3),
('DAVE','GRAPE',3)
Select c.*
, SUM(Price) OVER (PARTITION BY c.[Name], c.[Fruit] ORDER BY c.[Name], c.[Fruit] rows between unbounded preceding and current ROW ) as RunningTotal
from #Sales c
order by c.[Name], c.[Fruit] asc
drop table #Sales
Output:
The solution to your problem is GROUPING SETS. However, your rows are not unique. Alas, so this adds a unique value, just so you can keep your original rows:
with t as (
select t.*, row_number() over (order by (select null)) as seqnum
from t
)
select name, ,
coalesce(fruit, 'Total') as fruit,
sum(price) as price
from t
group by grouping sets ( (name, fruit, seqnum), (name) )
order by name,
(case when fruit is not null then 1 else 2 end);

Selecting Top 1 for Every ID

I have the following table:
| ID | ExecOrd | date |
| 1 | 1.0 | 3/4/2014|
| 1 | 2.0 | 7/7/2014|
| 1 | 3.0 | 8/8/2014|
| 2 | 1.0 | 8/4/2013|
| 2 | 2.0 |12/2/2013|
| 2 | 3.0 | 1/3/2014|
| 2 | 4.0 | |
I need to get the date of the top ExecOrd per ID of about 8000 records, and so far I can only do it for one ID:
SELECT TOP 1 date
FROM TABLE
WHERE DATE IS NOT NULL and ID = '1'
ORDER BY ExecOrd DESC
A little help would be appreciated. I have been trying to find a similar question to mine with no success.
There are several ways of doing this. A generic approach is to join the table back to itself using max():
select t.date
from yourtable t
join (select max(execord) execord, id
from yourtable
group by id
) t2 on t.id = t2.id and t.execord = t2.execord
If you're using 2005+, I prefer to use row_number():
select date
from (
select row_number() over (partition by id order by execord desc) rn, date
from yourtable
) t
where rn = 1;
SQL Fiddle Demo
Note: they will give different results if ties exist.
;with cte as (
SELECT id,row_number() over(partition by ID order byExecOrd DESC) r
FROM TABLE WHERE DATE IS NOT NULL )
select id from
cte where r=1

Selecting row with highest ID based on another column

In SQL Server 2008 R2, suppose I have a table layout like this...
+----------+---------+-------------+
| UniqueID | GroupID | Title |
+----------+---------+-------------+
| 1 | 1 | TEST 1 |
| 2 | 1 | TEST 2 |
| 3 | 3 | TEST 3 |
| 4 | 3 | TEST 4 |
| 5 | 5 | TEST 5 |
| 6 | 6 | TEST 6 |
| 7 | 6 | TEST 7 |
| 8 | 6 | TEST 8 |
+----------+---------+-------------+
Is it possible to select every row with the highest UniqueID number, for each GroupID. So according to the table above - if I ran the query, I would expect this...
+----------+---------+-------------+
| UniqueID | GroupID | Title |
+----------+---------+-------------+
| 2 | 1 | TEST 2 |
| 4 | 3 | TEST 4 |
| 5 | 5 | TEST 5 |
| 8 | 6 | TEST 8 |
+----------+---------+-------------+
Been chomping on this for a while, but can't seem to crack it.
Many thanks,
SELECT *
FROM (SELECT uniqueid, groupid, title,
Row_number()
OVER ( partition BY groupid ORDER BY uniqueid DESC) AS rn
FROM table) a
WHERE a.rn = 1
With SQL-Server as rdbms you can use a ranking function like ROW_NUMBER:
WITH CTE AS
(
SELECT UniqueID, GroupID, Title,
RN = ROW_NUMBER() OVER (PARTITON BY GroupID
ORDER BY UniqueID DESC)
FROM dbo.TableName
)
SELECT UniqueID, GroupID, Title
FROM CTE
WHERE RN = 1
This returns exactly one record for each GroupID even if there are multiple rows with the highest UniqueID (the name does not suggest so). If you want to return all rows in then use DENSE_RANK instead of ROW_NUMBER.
Here you can see all functions and how they work: http://technet.microsoft.com/en-us/library/ms189798.aspx
Since you have not mentioned any RDBMS, this statement below will work on almost all RDBMS. The purpose of the subquery is to get the greatest uniqueID for every GROUPID. To be able to get the other columns, the result of the subquery is joined on the original table.
SELECT a.*
FROM tableName a
INNER JOIN
(
SELECT GroupID, MAX(uniqueID) uniqueID
FROM tableName
GROUP By GroupID
) b ON a.GroupID = b.GroupID
AND a.uniqueID = b.uniqueID
In the case that your RDBMS supports Qnalytic functions, you can use ROW_NUMBER()
SELECT uniqueid, groupid, title
FROM
(
SELECT uniqueid, groupid, title,
ROW_NUMBER() OVER (PARTITION BY groupid
ORDER BY uniqueid DESC) rn
FROM tableName
) x
WHERE x.rn = 1
TSQL Ranking Functions
The ROW_NUMBER() generates sequential number which you can filter out. In this case the sequential number is generated on groupid and sorted by uniqueid in descending order. The greatest uniqueid will have a value of 1 in rn.
SELECT *
FROM the_table tt
WHERE NOT EXISTS (
SELECT *
FROM the_table nx
WHERE nx.GroupID = tt.GroupID
AND nx.UniqueID > tt.UniqueID
)
;
Should work in any DBMS (no window functions or CTEs are needed)
is probably faster than a sub query with an aggregate
Keeping it simple:
select * from test2
where UniqueID in (select max(UniqueID) from test2 group by GroupID)
Considering:
create table test2
(
UniqueID numeric,
GroupID numeric,
Title varchar(100)
)
insert into test2 values(1,1,'TEST 1')
insert into test2 values(2,1,'TEST 2')
insert into test2 values(3,3,'TEST 3')
insert into test2 values(4,3,'TEST 4')
insert into test2 values(5,5,'TEST 5')
insert into test2 values(6,6,'TEST 6')
insert into test2 values(7,6,'TEST 7')
insert into test2 values(8,6,'TEST 8')

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.
For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.
If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t
Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle
How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.