Find distinct rows based on combination of two columns - sql

I am having trouble finding a good solution to a problem in SQL.
Say I have table like this:
ID | A | A
--------------------
4427 | 2 | 3
4427 | 3 | 2
4427 | 3 | 5
4427 | 5 | 3
4427 | 1 | 3
4427 | 2 | 5
4427 | 3 | 1
Now I want to find the rows with a unique combination of ID and A. By this I mean that I only want to see the rows where the combination of A(1) and A(2) exists only once. For example the first two rows in the table is "the same" since the combination of 2 and 3 is the same as 3 and 2.
The expected result to my quesion would be:
ID | A | A
--------------------
4427 | 2 | 3
4427 | 3 | 5
4427 | 1 | 3
4427 | 2 | 5
I am using SQL Server 2008.

If I understand correctly, this would be a simple "distinct" query if you ALWAYS had the ColA with a value less then ColB. Given this, you can use a case statement to handle this, combined with a distinct operator.
Try this.
Select Distinct
ID,
Case When ColA < ColB Then ColA Else ColB End,
Case When ColA < ColB Then ColB Else ColA End
From YourTableHere

One way to do this. FYI, the output A1/A2 may have differnet order than original value. If you don't care the order, you may try this.
WITH cte AS (
SELECT ID,A1 AS A, ROW_NUMBER() OVER (ORDER BY A1,A2) AS rn FROM Table1
UNION ALL
SELECT ID,A2 AS A, ROW_NUMBER() OVER (ORDER BY A1,A2) AS rn FROM Table1
)
SELECT DISTINCT ID, MIN(A) AS A1, MAX(A) AS A2
FROM cte
GROUP BY ID, rn
SQL Fiddle

Related

Select IDs from multiple rows where column values satisfy one condition but not another

Hello I have the following problem.
I have a table like the one in this sql fiddle
This table defines a relationship and it contains IDs from two other tables
example values
| FirstID | SecondID |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
| 2 | 5 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
I want to select all the FirstIDs that satisfy the following criteria.
Their corresponding SecondIDs are in the range 1-3 AND NOT in the range 4-5
For example in this case we would want FirstIDs 1 and 3.
I have tried the following queries
SELECT FirstID from table
WHERE SecondID IN (1,2,3) AND SecondID NOT IN (4,5)
SELECT FirstID,SecondID
FROM(
SELECT FirstID, SecondID
FROM table
WHERE SecondID in (1,2,3,4,5) )
WHERE SecondID NOT IN (4,5)
but I don't get the correct results I am aiming for.
What is the correct query to get the data I want?
SELECT FirstID
FROM table
WHERE SecondId in (1,2,3) --Included values
AND FirstID NOT IN (SELECT FirstID FROM test
WHERE SecondId IN (4,5)) --Excluded values
How about min() and max():
select firstid
from t
group by firstid
having min(secondId) between 1 and 3 and
max(secondid) between 1 and 3;
Assuming 1 is the minimum, then this can be simplified to:
having max(secondid) <= 3;
For arbitrary ranges, you can use sum(case):
having sum(case when secondId between 1 and 3 then 1 else 0 end) > 0 and
sum(case when secondId between 4 and 5 then 1 else 0 end) = 0;
I think Gonzalo Lorieto proably has the best answer to this question already, but depending on the size of your data, SELECT statements in a WHERE clause can get really slow, and the below might be significantly faster (although it's not clear it's worth it for the reduced readability...)
SELECT inrange.FirstId FROM
t inrange
LEFT OUTER JOIN
(SELECT FirstID FROM t
WHERE SEcondId IN (4,5)) outrange
ON inrange.firstID = outrange.firstId
WHERE SecondID IN (1,2,3)
AND outrange.firstId IS NULL
GROUP BY inrange.FirstId
You will want to use the EXISTS clause to exclude the FirstIDs that have an invalid SecondID. here is an example:
SELECT FirstID from test Has123
WHERE SecondID IN (1,2,3)
AND NOT EXISTS (
SELECT 1 FROM test Not45
WHERE Has123.FirstID = Not45.FirstID
AND Not45.SecondID IN (4,5)
)
GROUP BY FirstID
SqlFiddle

How to delete duplicate rows based on one column in postgreSQL?

Say I have column A, B and Date, and I want all rows which are duplicated in A to be removed, while keeping the one with the most recent Date. How would I do this?
I have looked at many other solutions but none seem to work for my case.
Thanks in advance for any help
This should work for you:
DELETE FROM YourTable USING
(SELECT colA, MAX(Date) maxDate
FROM YourTable
GROUP BY colA
) AS Keep
WHERE Keep.maxDate <> YourTable.Date
AND Keep.ColA = YourTable.ColA
will stay:
t=# with sample(a,b,dat) as (values(1,1,1),(1,1,2),(1,2,3),(2,2,3),(2,2,4))
, comparison as (select *,max(dat) over (partition by a) from sample)
select *
from comparison
where dat = max;
a | b | dat | max
---+---+-----+-----
1 | 2 | 3 | 3
2 | 2 | 4 | 4
(2 rows)
and thus to be deleted:
t=# with sample(a,b,dat) as (values(1,1,1),(1,1,2),(1,2,3),(2,2,3),(2,2,4))
, comparison as (select *,max(dat) over (partition by a) from sample)
delete
from comparison
where dat <> max
returning *;
a | b | dat | max
---+---+-----+-----
1 | 1 | 1 | 3
1 | 1 | 2 | 3
2 | 2 | 3 | 4
(3 rows)
of course instead of comparison you should name your table

pulling data from max field

I have a table structure with columns similar to the following:
ID | line | value
1 | 1 | 10
1 | 2 | 5
2 | 1 | 6
3 | 1 | 7
3 | 2 | 4
ideally, i'd like to pull the following:
ID | value
1 | 5
2 | 6
3 | 4
one solution would be to do something like the following:
select a.ID, a.value
from
myTable a
inner join (select id, max(line) as line from myTable group by id) b
on a.id = b.id and a.line = b.line
Given the size of the table and that this is just a part of a larger pull, I'd like to see if there's a more elegant / simpler way of pulling this directly.
This is a task for OLAP-functions:
select *
from myTable a
qualify
rank() -- assign a rank for each id
over (partition by id
order by line desc) = 1
Might return multiple rows per id if they share the same max line. If you want to return only one of them, add another column to the order by to make it unique or switch to row_number to get an indeterminate row.

I want to group one columns based on a condition on another column

Suppose I have a table like this :
Column A | Column B
---------+---------
1 | A
1 | B
2 | A
2 | A
2 | C
3 | A
3 | A
3 | B
3 | B
I want to write a query that groups the values in such a way that i get a table like this :
Column A | Column B
---------+---------
1 | A
1 | B
2 | A
2 | C
3 | A
3 | B
You are looking for DISTINCT. DISTINCT removes duplicates from your query result.
select distinct * from mytable;
An alternative would be aggregation. You'd get a result row per a and b by grouping by them. You'd only use this however, when you want aggregates, e.g. the number of rows, a sum, an average, etc., because otherwise you can use DISTINCT as shown and should prefer it.
select a, b, count(*) from mytable group by a, b;
You need to use GROUP BY on both column.
select col1, col2 from test group by col1,col2;
See SQLFIDDLE

Select multiple distinct rows from table SQL

I am attempting to select distinct (last updated) rows from a table in my database. I am trying to get the last updated row for each "Sub section". However I cannot find a way to achieve this.
The table looks like:
ID | Name |LastUpdated | Section | Sub |
1 | Name1 | 2013-04-07 16:38:18.837 | 1 | 1 |
2 | Name2 | 2013-04-07 15:38:18.837 | 1 | 2 |
3 | Name3 | 2013-04-07 12:38:18.837 | 1 | 1 |
4 | Name4 | 2013-04-07 13:38:18.837 | 1 | 3 |
5 | Name5 | 2013-04-07 17:38:18.837 | 1 | 3 |
What I am trying to get my SQL Statement to do is return rows:
1, 2, and 5.
They are distinct for the Sub, and the most recent.
I have tried:
SELECT DISTINCT Sub, LastUpdated, Name
FROM TABLE
WHERE LastUpdated = (SELECT MAX(LastUpdated) FROM TABLE WHERE Section = 1)
Which only returns the distinct row for the most recent updated Row. Which makes sense.
I have googled what I am trying, and checked relevant posts on here. However not managed to find one which really answers what I am trying.
You can use the row_number() window function to assign numbers for each partition of rows with the same value of Sub. Using order by LastUpdated desc, the row with row number one will be the latest row:
select *
from (
select row_number() over (
partition by Sub
order by LastUpdated desc) as rn
, *
from YourTable
) as SubQueryAlias
where rn = 1
Wouldn't it be enough to use group by?
SELECT DISTINCT MIN(Sub), MAX(LastUpdated), MIN(NAME) FROM TABLE GROUP BY Sub Where Section = 1