Select all from a table, where 2 columns are Distinct - sql

Hi I have a table of deals, I need to return the entire table but I need the title and the price to be distinct, as there is quite a few double ups, I've put in an example scenario below
Col ID || Col Title || Col Price || Col Source
a b c d
a b c b
b a a c
b a a 1
Expected result:
a b c d
b a a c
I'm not sure whether or not to use distinct or group by here, any suggestions would be appreciated
Cheers
Scott
=======================
Looking at some of your suggestions I'm going to have to rethink this, Thanks guys

This will arbitrarily pick one of the rows for each distinct (price,title) pair
;WITH myCTE AS
(
SELECT
*,
ROWNUMBER() OVER (PARTITION BY Price, Title ORDER BY Source) AS rn
FROM
MyTable
)
SELECT
*
FROM
myCTE
WHERE
rn = 1

You can use group by, but to return only title and price, ID and source would have to be ignored

You are asking for entire table but in your sample output you have lost two Records and thus losing the value of 'Col Source'.
a b c b
b a a 1
Group By will help you write very simple query
select id, title, price, source from table group by title, price

A DISTINCT and GROUP BY usually generate the same query plan, so performance should be the same across both query constructs. GROUP BY should be used to apply aggregate operators to each group. If all you need is to remove duplicates then use DISTINCT. If you are using sub-queries execution plan for that query varies so in that case you need to check the execution plan before making decision of which is faster.
You should go for the GROUP BY as the entire columns required in your resultset. However, the DISTINCT will return only unique list of specific column.
SELECT ID, Title, Price, Source
FROM table as t
GROUP BY Title, Price

Related

How can I select the largest value when an ungrouped column is dependent on a grouped column?

I am new to Redshift. I have two tables, ticket_booking and ticket_review, the relation of the two tables is one - many. Which when combined looks like:
The result I am looking for (I want to get the highest number per ticket_booking id) is:
I tried to obtain the desired result using the group by command to help distinct records. See script below:
select b.id, r.id, max(r.number) as revision_number
from dw.ticket_review as r, dw.ticket_booking as b
where r.ticket_booking_id = b.id
group by b.id
However, I get an error column "r.id" must appear in the GROUP BY clause or be used in an aggregate function. If I do this I get the result of the first picture. I tried different approaches mentioned in different questions but none seem to help me with my situation. Any help would be deeply appreciated! :)
Per booking assign row number ordering from highest review, then pick first rows only:
select booking_id, ticket_review, number
from (select b.id as booking_id, r.id as ticket_review, r.number,
row_number() over (partition by b.id order by r.id desc) rn
from dw.ticket_review as r, dw.ticket_booking as b
where r.ticket_booking_id = b.id) x
where rn = 1;

SQL Select unique combinations of rows for other column value

I’m trying to do an analysis of the different combinations of taxes per invoice to identify how many scenarios exist.
In the tax table, column 1 is invoiceNo, column 2 is taxType. These form the composite key. There can be 1 or more taxType per invoiceNo. Example of data:
https://i.imgur.com/bcQc7vY_d.jpg?maxwidth=640&shape=thumb&fidelity=medium (Sorry but i’m new so can’t add picture).
I want to be able to report on unique taxType for any invoiceNo. Ie, 1 A is unique comb 1, 2 AB is unique comb 2, 3 A is disregarded as already returned for 1, and 4 BC is unique comb 3.
Not sure if this makes sense! Finding it hard to articulate what I’m after!
Expected output would be:
A
AB
BC
The original version of this question was tagged MySQL, so this answers the question.
If I understand correctly, you can use group_concat():
select distinct group_concat(taxtype order by taxtype)
from t
group by invoiceno;
This works with the table you have given and would work with those combinations of Tax types even if they repeat but if there are more tax codes, or there is an AC combination, or if some of the given combinations are omitted then it might get little different! You could develop this to suit the conditions, or you could give some more info: Do invoices have three codes (ABC)? do invoices have just B or just C codes? I notice that the BC invoice etc
WITH CTE (RN,InvoiceNo,TT1,TT2)
AS
(
SELECT ROW_NUMBER() OVER (ORDER BY a.InvoiceNo),a.InvoiceNo,a.TaxType,b.TaxType
FROM UniqueCombo a INNER JOIN UniqueCombo b ON a.InvoiceNo=b.InvoiceNo
)
,
CTE2 (RN,InvoiceNo,TT1,TT2)
AS
(
SELECT * FROM CTE WHERE RN IN
(
SELECT MAX(RN) FROM CTE WHERE TT1=TT2 GROUP BY InvoiceNo HAVING COUNT(InvoiceNo)=1
)
)
SELECT TT1 FROM CTE2 WHERE RN IN
(
SELECT MAX(RN) FROM CTE WHERE TT1=TT2 GROUP BY TT1,TT2 HAVING COUNT(InvoiceNo)>1
)
UNION
SELECT TT1+''+TT2 FROM CTE WHERE RN IN
(
SELECT MAX(RN)-1 FROM CTE WHERE TT1<>TT2 GROUP BY InvoiceNo
)
You can try STRING_AGG. Something like:
SELECT DISTINCT TaxTypeString
FROM
(
SELECT InvoiceNo, STRING_AGG(TaxType, '') AS TaxTypeString
FROM t
GROUP BY InvoiceNo
) x
ORDER BY TaxTypeString
The nested query, called x, should give you one row per invoice number, in the format you want. Then you have to select the distinct tax types from there.

SQL Separating Distinct Values using single column

Does anyone happen to know a way of basically taking the 'Distinct' command but only using it on a single column. For lack of example, something similar to this:
Select (Distinct ID), Name, Term from Table
So it would get rid of row with duplicate ID's but still use the other column information. I would use distinct on the full query but the rows are all different due to certain columns data set. And I would need to output only the top most term between the two duplicates:
ID Name Term
1 Suzy A
1 Suzy B
2 John A
2 John B
3 Pete A
4 Carl A
5 Sally B
Any suggestions would be helpful.
select t.Id, t.Name, t.Term
from (select distinct ID from Table order by id, term) t
You can use row number for this
Select ID, Name, Term from(
Select ID, Name, Term, ROW_NUMBER ( )
OVER ( PARTITION BY ID order by Name) as rn from Table
Where rn = 1)
as tbl
Order by determines the order from which the first row will be picked.

How to apply Count on multiple distinct columns and use Having clause

I would like to do something like this , but getting an error please suggest some good methods?
select A,B,C, count(Distinct A,B,C)
from table_name
group by A,B,C
having count(Distinct A,B,C) > 1
Basically i have an index on the columns(A,B,C), and some rows doesnt have this unique combination set, So I'm trying a query similar to identify the rows which disobeys the unique constraint. PLease let me know if there is a best way
If you group by these columns then you already only get those unique records and then you can use count(*) to get how many duplicates you have
select A,B,C, count(*)
from table_name
group by A,B,C
HAVING count(*) > 1
What #jurgend said is right, and you can further find the exact rows (I'm assuming there are more fields to look at, including maybe a PK) by doing
SELECT *
FROM table_name
WHERE (A,B,C) IN (
SELECT A, B, C
FROM table_name
GROUP BY A, B, C
HAVING COUNT(*) > 1
)
A Tuple IN list query works in Oracle, although not all other DBMS.

Get row count including column values in sql server

I need to get the row count of a query, and also get the query's columns in one single query. The count should be a part of the result's columns (It should be the same for all rows, since it's the total).
for example, if I do this:
select count(1) from table
I can have the total number of rows.
If I do this:
select a,b,c from table
I'll get the column's values for the query.
What I need is to get the count and the columns values in one query, with a very effective way.
For example:
select Count(1), a,b,c from table
with no group by, since I want the total.
The only way I've found is to do a temp table (using variables), insert the query's result, then count, then returning the join of both. But if the result gets thousands of records, that wouldn't be very efficient.
Any ideas?
#Jim H is almost right, but chooses the wrong ranking function:
create table #T (ID int)
insert into #T (ID)
select 1 union all
select 2 union all
select 3
select ID,COUNT(*) OVER (PARTITION BY 1) as RowCnt from #T
drop table #T
Results:
ID RowCnt
1 3
2 3
3 3
Partitioning by a constant makes it count over the whole resultset.
Using CROSS JOIN:
SELECT a.*, b.numRows
FROM YOUR_TABLE a
CROSS JOIN (SELECT COUNT(*) AS numRows
FROM YOUR_TABLE) b
Look at the Ranking functions of SQL Server.
SELECT ROW_NUMBER() OVER (ORDER BY a) AS 'RowNumber', a, b, c
FROM table;
You could do it like this:
SELECT x.total, a, b, c
FROM
table
JOIN (SELECT total = COUNT(*) FROM table) AS x ON 1=1
which will return the total number of records in the first column, followed by fields a,b & c