Trying to select multiple columns where one is unique - sql

I am trying to select several columns from a table where one of the columns is unique. The select statement looks something like this:
select a, distinct b, c, d
from mytable
The table looks something like this:
| a | b | c | d | e |...
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5
| 1 | 2 | 3 | 4 | 6
| 2 | 5 | 7 | 1 | 9
| 7 | 3 | 8 | 6 | 4
| 7 | 3 | 8 | 6 | 7
So the query should return something like this:
| a | b | c | d |
|---|---|---|---|
| 1 | 2 | 3 | 4
| 2 | 5 | 7 | 1
| 7 | 3 | 8 | 6
I just want to remove all of the rows where b is duplicated.
EDIT: There seems to be some confusion about which row I want to be selected in the case of duplicate b values. I don't care because the a, c, and d should (but are not guaranteed to) be the same.

Try this
SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY b ORDER BY a) NO
,* FROM TableName) AS T1 WHERE NO = 1

I think you are nearly there with DISTINCT try:
SELECT DISTINCT a, b, c, d
FROM myTable

You haven't said how to pick a row for each b value, but this will pick one for each.
Select
a,
b,
c,
d,
e
From (
Select
a,
b,
c,
d,
e,
row_number() over (partition by b order by b) rn
From
mytable
) x
Where
x.rn = 1

If you don't care what values you get for B, C, D, and E, as long as they're appropriate for that key, you can group by A:
SELECT A, MIN(B), MIN(C), MIN(D), MIN(E)
FROM MyTable
GROUP BY A
Note that MAX() would be just as valid. Some RDBMSs support a FIRST() aggregate, or similar, for exactly these circumstances where you don't care which value you get (from a certain population).

This will return what you're looking for but I think your example is flawed because you've no determinism over which value from the e column is returned.
Create Table A1 (a int, b int, c int, d int, e int)
INSERT INTO A1 (a,b,c,d,e) VALUES (1,2,3,4,5)
INSERT INTO A1 (a,b,c,d,e) VALUES (1,2,3,4,6)
INSERT INTO A1 (a,b,c,d,e) VALUES (2,5,7,1,9)
INSERT INTO A1 (a,b,c,d,e) VALUES (7,3,8,6,4)
INSERT INTO A1 (a,b,c,d,e) VALUES (7,3,8,6,7)
SELECT * FROM A1
SELECT a,b,c,d
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY b ORDER BY a) RowNum ,*
FROM A1
) As InnerQuery WHERE RowNum = 1

You cannot put DISTINCT on a single column. You should put it right after the SELECT:
SELECT DISTINCT a, b, c, d
FROM mytable
It return the result you need for your sample table. However if you require to remove duplicates only from a single column (which is not possible) you probably misunderstood something. Give us more descriptions and sample, and we try to guide you to the right direction.

Related

Combining multiple sub-selects into one result row

I'm a bit stuck on this supposedly basic SQL and would appreciate some pointers.
I'd like to get a single row result from combining multiple sub-selects. What I have so far (which of course does not work):
select * from (
(select count(*) from a where name='a') as a),
(select count(*) from b where name='d') as b)
) as foo;
...and I'm looking for a result along the lines of:
a | b
-----
1 | 2
Given the source tables:
Table a:
id | name
----+------
1 | a
2 | b
3 | c
Table b:
id | name
----+------
1 | a
2 | b
3 | c
4 | d
5 | d
I also tried something along the lines of
select count(a.*), count(b.*) from a, b where a.name='a' and b.name='d';
which produces:
count | count
------+-------
2 | 2
I'd appreciate any assistance.
thanks
Just use:
select (select count(*) from a where name='a') as a,
(select count(*) from b where name='d') as b

Order rows based on aggregated data

I have a sample table like shown below :
select * from sampleTable;
label | data
-------+------
a | 1
b | 2
c | 3
d | 4
a | 5
b | 6
(6 rows)
I require rows to be sorted with the summed up values of 'data' column (i.e) c with data of 3 should come first and b with combined data of 2 and 6 should come last and others in-between like shown below
label | data
-------+------
c | 3
d | 4
a | 1
a | 5
b | 2
b | 6
I have tried to achieve this with a self join as shown below. But it seems a bit verbose. Am I doing it right or is there a better way to achieve the same without joins?
select l, data from sampleTable join (select label as l, sum(data) as x from sampleTable group by l) m on label = m.l order by x;
l | data
---+------
c | 3
d | 4
a | 1
a | 5
b | 2
b | 6
(6 rows)
You can avoid the self-join by using a SUM with a windowed function, something like this:
SELECT label
, data
FROM (
SELECT *
, SUM(data) OVER (PARTITION BY label) pts
FROM sampleTable
) AS rez
ORDER BY pts
You don't need a self-join or a subquery. You can use window functions in the order by:
select t.*
from t
order by sum(data over (partition by label),
label;
Note the inclusion of label as the second key. This is important for distinguishing ties in the data. It ensures that the all rows for a given label all appear together.
Simply use the sum window function in ORDER BY
SELECT l, d
FROM tab
ORDER BY SUM(d) OVER (PARTITION BY l)
dbfiddle demo

I want to group one columns based on a condition on another column

Suppose I have a table like this :
Column A | Column B
---------+---------
1 | A
1 | B
2 | A
2 | A
2 | C
3 | A
3 | A
3 | B
3 | B
I want to write a query that groups the values in such a way that i get a table like this :
Column A | Column B
---------+---------
1 | A
1 | B
2 | A
2 | C
3 | A
3 | B
You are looking for DISTINCT. DISTINCT removes duplicates from your query result.
select distinct * from mytable;
An alternative would be aggregation. You'd get a result row per a and b by grouping by them. You'd only use this however, when you want aggregates, e.g. the number of rows, a sum, an average, etc., because otherwise you can use DISTINCT as shown and should prefer it.
select a, b, count(*) from mytable group by a, b;
You need to use GROUP BY on both column.
select col1, col2 from test group by col1,col2;
See SQLFIDDLE

Remove partial duplicates sql server

I am altering an existing view within SQL Server. My union statement creates something along the lines of:
Col1 | C2 | C3 | C4
-----|----|------|-----
1 A | B | NULL | NULL
2 A | B | C | NULL
3 A | B | C | D
4 E | F | NULL | NULL
5 E | F | G | NULL
However, I only want (in this scenario) rows 3 and 5 (I need to ommit one and two because they contain duplicate info - columns one, two, and three contain the same info as row three, but the third row is the most 'complete'). Row 5 for the same reason vs row 4.
Is this an outer join / intersect issue? How the heck do you create a view in this manner?
Assuming that Col1 is not NULL, then we can use ROW_NUMBER with order by on all 4 columns total value
; with cte
AS
(
select ROW_NUMBER() over ( partition by col1 order by (coalesce(Col1,'')+
coalesce([C2],'') +
coalesce([C3],'') +
coalesce([C4],'') ) desc) as seq,
*
FROM Table1
)
select * from cte
where seq =1

Distinct Values Ignoring Column Order

I have a table similar to:-
+----+---+---+
| Id | A | B |
+----+---+---+
| 1 | 1 | 2 |
+----+---+---+
| 2 | 2 | 1 |
+----+---+---+
| 3 | 3 | 4 |
+----+---+---+
| 4 | 0 | 5 |
+----+---+---+
| 5 | 5 | 0 |
+----+---+---+
I want to remove all duplicate pairs of values, regardless of which column contains which value, e.g. after whatever the query might be I want to see:-
+----+---+---+
| Id | A | B |
+----+---+---+
| 1 | 1 | 2 |
+----+---+---+
| 3 | 3 | 4 |
+----+---+---+
| 4 | 0 | 5 |
+----+---+---+
I'd like to find a solution in Microsoft SQL Server (has to work in <= 2005, though I'd be interested in any solutions which rely upon >= 2008 features regardless).
In addition, note that A and B are going to be in the range 1-100 (but that's not guaranteed forever. They are surrogate seeded integer foreign keys, however the foreign table might grow to a couple hundred rows max).
I'm wondering whether I'm missing some obvious solution here. The ones which have occurred all seem rather overwrought, though I do think they'd probably work, e.g.:-
Have a subquery return a bitfield with each bit corresponding to one of the ids and use this value to remove duplicates.
Somehow, pivot, remove duplicates, then unpivot. Likely to be tricky.
Thanks in advance!
Test data and sample below.
Basically, we do a self join with an OR criteria so either a=a and b=b OR a=b and b=a.
The WHERE in the subquery gives you the max for each pair to eliminate.
I think this should work for triplicates as well (note I added a 6th row).
DECLARE #t table(id int, a int, b int)
INSERT INTO #t
VALUES
(1,1,2),
(2,2,1),
(3,3,4),
(4,0,5),
(5,5,0),
(6,5,0)
SELECT *
FROM #t
WHERE id NOT IN (
SELECT a.id
FROM #t a
INNER JOIN #t b
ON (a.a=b.a
AND a.b=b.b)
OR
(a.b=b.a
AND a.a = b.b)
WHERE a.id > b.id)
Try:
select min(Id) Id, A, B
from (select Id, A, B from DuplicatesTable where A <= B
union all
select Id, B A, A B from DuplicatesTable where A > B) v
group by A, B
order by 1
Not 100% tested and I'm sure it can be tidied up but it produces your required result:
DECLARE #T TABLE (id INT IDENTITY(1,1), A INT, B INT)
INSERT INTO #T
VALUES (1,2), (2,1), (3,4), (0,5), (5,0);
SELECT *
FROM #T
WHERE id IN (SELECT DISTINCT MIN(id)
FROM (SELECT id, a, b
FROM #T
UNION ALL
SELECT id, b, a
FROM #T) z
GROUP BY a, b)