Selecting unique rows in a set of two possibilities - sql

The problem itself is simple, but I can't figure out a solution that does it in one query, and here's my "abstraction" of the problem to allow for a simpler explanation:
I will let my original explenation stand, but here's a set of sample data and the result i expect:
Ok, so here's some sample data, i separated pairs by a blank line
-------------
| Key | Col | (Together they from a Unique Pair)
--------------
| 1 Foo |
| 1 Bar |
| |
| 2 Foo |
| |
| 3 Bar |
| |
| 4 Foo |
| 4 Bar |
--------------
And the result I would expect, after running the query once, it need to be able to select this result set in one query:
1 - Foo
2 - Foo
3 - Bar
4 - Foo
Original explenation:
I have a table, call it TABLE where I have a two columns say ID and NAME which together form the primary key of the table. Now I want to select something where ID=1 and then first checks if it can find a row where NAME has the value "John", if "John" does not exist it should look for a row where NAME is "Bruce" - but only return "John" if both "Bruce" and "John" exists or only "John" exists of course.
Also note that it should be able to return several rows per query that match the above criteria but with different ID/Name-combinations of course, and that the above explanation is just a simplification of the real problem.
I could be completely blinded by my own code and line of thought but I just can't figure this out.

This is fairly similar to what you wrote, but should be fairly speedy as NOT EXISTS is more efficient, in this case, than NOT IN...
mysql> select * from foo;
+----+-----+
| id | col |
+----+-----+
| 1 | Bar |
| 1 | Foo |
| 2 | Foo |
| 3 | Bar |
| 4 | Bar |
| 4 | Foo |
+----+-----+
SELECT id
, col
FROM foo f1
WHERE col = 'Foo'
OR ( col = 'Bar' AND NOT EXISTS( SELECT *
FROM foo f2
WHERE f1.id = f2.id
AND f2.col = 'Foo'
)
);
+----+-----+
| id | col |
+----+-----+
| 1 | Foo |
| 2 | Foo |
| 3 | Bar |
| 4 | Foo |
+----+-----+

You can join the initial table to itself with an OUTER JOIN like this:
create table #mytest
(
id int,
Name varchar(20)
);
go
insert into #mytest values (1,'Foo');
insert into #mytest values (1,'Bar');
insert into #mytest values (2,'Foo');
insert into #mytest values (3,'Bar');
insert into #mytest values (4,'Foo');
insert into #mytest values (4,'Bar');
go
select distinct
sc.id,
isnull(fc.Name, sc.Name) sel_name
from
#mytest sc
LEFT OUTER JOIN #mytest fc
on (fc.id = sc.id
and fc.Name = 'Foo')
like that.

No need to make this overly complex, you can just use MAX() and group by ...
select id, max(col) from foo group by id

try this:
select top 1 * from (
SELECT 1 as num, * FROM TABLE WHERE ID = 1 AND NAME = 'John'
union
SELECT 2 as num, * FROM TABLE WHERE ID = 1 AND NAME = 'Bruce'
) t
order by num

I came up with a solution myself, but it's kind of complex and slow - nor does it expand well to more advanced queries:
SELECT *
FROM users
WHERE name = "bruce"
OR (
name = "john"
AND NOT id
IN (
SELECT id
FROM posts
WHERE name = "bruce"
)
)
No alternatives without heavy joins, etc. ?

Ok, so here's some sample data, i separated pairs by a blank line
-------------
| Key | Col | (Together they from a Unique Pair)
--------------
| 1 Foo |
| 1 Bar |
| |
| 2 Foo |
| |
| 3 Bar |
| |
| 4 Foo |
| 4 Bar |
--------------
And the result I would expect:
1 - Foo
2 - Foo
3 - Bar
4 - Foo
I did solve it above, but that query is horribly inefficient for lager tables, any other way?

Here's an example that works in SQL Server 2005 and later. It's a useful pattern where you want to choose the top row (or top n rows) based on a custom ordering. This will let you not just choose among two values with custom priorities, but any number. You can use the ROW_NUMBER() function and a CASE expression:
CREATE TABLE T (id int, col varchar(10));
INSERT T VALUES (1, 'Foo')
INSERT T VALUES (1, 'Bar')
INSERT T VALUES (2, 'Foo')
INSERT T VALUES (3, 'Bar')
INSERT T VALUES (4, 'Foo')
INSERT T VALUES (4, 'Bar')
SELECT id,col
FROM
(SELECT id, col,
ROW_NUMBER() OVER (
PARTITION BY id
ORDER BY
CASE col
WHEN 'Foo' THEN 1
WHEN 'Bar' THEN 2
ELSE 3 END
) AS RowNum
FROM T
) AS X
WHERE RowNum = 1
ORDER BY id

In PostgreSQL, I believe it would be this:
SELECT DISTINCT ON (id) id, name
FROM mytable
ORDER BY id, name = 'John' DESC;
Update - false sorts before true - I had it backwards originally. Note that DISTINCT ON is a PostgreSQL feature and not part of standard SQL. What happens here is that it only shows you the first row for any given id that it comes across. Since we order by weather the name is John, rows named John will be selected over all other names.
With your second example, it would be:
SELECT DISTINCT ON (key) key, col
FROM mytable
ORDER BY key, col = 'Foo' DESC;
This will give you:
1 - Foo
2 - Foo
3 - Bar
4 - Foo

You can use joins instead of the exists and this may improve the query plan in cases where the optimizer is not smart enough:
SELECT f1.id
,f1.col
FROM foo f1
LEFT JOIN foo f2
ON f1.id = f2.id
AND f2.col = 'Foo'
WHERE f1.col = 'Foo'
OR ( f1.col = 'Bar' AND f2.id IS NULL )

Related

how do I make a query which references other rows in a resultset?

The question is best asked with an example.
I have a table
id | name | attr
1 | foo | a
2 | bar | a
3 | baz | b
and I want a query give me all the rows which share the same attr as 'name==foo', and thus returns
id | name | attr
1 | foo | a
2 | bar | a
because foo has attr=a, as does bar
You can use exists:
select t.*
from mytable t
where exists (
select 1 from mytable t1 where t1.attr = t.attr and t1.name = 'foo'
)
Note that this solution would also properly work if 'foo' had more than one attribute.
For performance, you want an index on (attr, name).
A simple way is a correlated subquery:
select t.*
from t
where t.attr = (select t2.attr from t t2 where t.name = 'foo');

SQL Select a group when attributes match at least a list of values

Given a table with a (non-distinct) identifier and a value:
| ID | Value |
|----|-------|
| 1 | A |
| 1 | B |
| 1 | C |
| 1 | D |
| 2 | A |
| 2 | B |
| 2 | C |
| 3 | A |
| 3 | B |
How can you select the grouped identifiers, which have values for a given list? (e.g. ('B', 'C'))
This list might also be the result of another query (like SELECT Value from Table1 WHERE ID = '2' to find all IDs which have a superset of values, compared to ID=2 (only ID=1 in this example))
Result
| ID |
|----|
| 1 |
| 2 |
1 and 2 are part of the result, as they have both A and B in their Value-column. 3 is not included, as it is missing C
Thanks to the answer from this question: SQL Select only rows where exact multiple relationships exist I created a query which works for a fixed list. However I need to be able to use the results of another query without changing the query. (And also requires the Access-specific IFF function):
SELECT ID FROM Table1
GROUP BY ID
HAVING SUM(Value NOT IN ('A', 'B')) = 0
AND SUM(IIF(Value='A', 1, 0)) = 1
AND SUM(IIF(Value='B', 1, 0)) = 1
In case it matters: The SQL is run on a Excel-table via VBA and ADODB.
In the where criteria filter on the list of values you would like to see, group by id and in the having clause filter on those ids which have 3 matching rows.
select id from table1
where value in ('A', 'B', 'C') --you can use a result of another query here
group by id
having count(*)=3
If you can have the same id - value pair more than once, then you need to slightly alter the having clause: having count(distinct value)=3
If you want to make it completely dynamic based on a subquery, then:
select id, min(valcount) as minvalcount from table1
cross join (select count(*) as valcount from table1 where id=2) as t1
where value in (select value from table1 where id=2) --you can use a result of another query here
group by id
having count(*)=minvalcount

PostgreSQL, splitting to rows and recognizing the first part

I have a table in PostgreSQL:
CREATE TABLE t1 (
id SERIAL,
values TEXT);
INSERT INTO t1 (values)
VALUES ('T815,T847'), ('F00,B4R,B4Z'), ('AS5,XX3'), ('G00');
like:
id|values
--------------
1 |T815,T847
2 |F00,B4R,B4Z
3 |AS5,XX3
4 |G00
I need to split the values in the values column on their own row and mark the first values, like this:
id|first|value
--------------
1 | yes | T815
1 | | T84T
2 | yes | F00
2 | | B4R
2 | | B4Z
3 | yes | AS5
3 | | XX3
4 | yes | G00
I can produce id and value with:
SELECT t1.id,
regexp_split_to_table(t1.values, E',')
FROM t1;
but how would I recognize and tag the first values?
There may be a better way, but you can use a brute force method:
SELECT t1.id, value,
(case when values like value || '%' then 'yes' end) as first
FROM (SELECT t1.id, t1.values,
regexp_split_to_table(t1.values, E',') as value
FROM t1
) t1;

Distinct Values Ignoring Column Order

I have a table similar to:-
+----+---+---+
| Id | A | B |
+----+---+---+
| 1 | 1 | 2 |
+----+---+---+
| 2 | 2 | 1 |
+----+---+---+
| 3 | 3 | 4 |
+----+---+---+
| 4 | 0 | 5 |
+----+---+---+
| 5 | 5 | 0 |
+----+---+---+
I want to remove all duplicate pairs of values, regardless of which column contains which value, e.g. after whatever the query might be I want to see:-
+----+---+---+
| Id | A | B |
+----+---+---+
| 1 | 1 | 2 |
+----+---+---+
| 3 | 3 | 4 |
+----+---+---+
| 4 | 0 | 5 |
+----+---+---+
I'd like to find a solution in Microsoft SQL Server (has to work in <= 2005, though I'd be interested in any solutions which rely upon >= 2008 features regardless).
In addition, note that A and B are going to be in the range 1-100 (but that's not guaranteed forever. They are surrogate seeded integer foreign keys, however the foreign table might grow to a couple hundred rows max).
I'm wondering whether I'm missing some obvious solution here. The ones which have occurred all seem rather overwrought, though I do think they'd probably work, e.g.:-
Have a subquery return a bitfield with each bit corresponding to one of the ids and use this value to remove duplicates.
Somehow, pivot, remove duplicates, then unpivot. Likely to be tricky.
Thanks in advance!
Test data and sample below.
Basically, we do a self join with an OR criteria so either a=a and b=b OR a=b and b=a.
The WHERE in the subquery gives you the max for each pair to eliminate.
I think this should work for triplicates as well (note I added a 6th row).
DECLARE #t table(id int, a int, b int)
INSERT INTO #t
VALUES
(1,1,2),
(2,2,1),
(3,3,4),
(4,0,5),
(5,5,0),
(6,5,0)
SELECT *
FROM #t
WHERE id NOT IN (
SELECT a.id
FROM #t a
INNER JOIN #t b
ON (a.a=b.a
AND a.b=b.b)
OR
(a.b=b.a
AND a.a = b.b)
WHERE a.id > b.id)
Try:
select min(Id) Id, A, B
from (select Id, A, B from DuplicatesTable where A <= B
union all
select Id, B A, A B from DuplicatesTable where A > B) v
group by A, B
order by 1
Not 100% tested and I'm sure it can be tidied up but it produces your required result:
DECLARE #T TABLE (id INT IDENTITY(1,1), A INT, B INT)
INSERT INTO #T
VALUES (1,2), (2,1), (3,4), (0,5), (5,0);
SELECT *
FROM #T
WHERE id IN (SELECT DISTINCT MIN(id)
FROM (SELECT id, a, b
FROM #T
UNION ALL
SELECT id, b, a
FROM #T) z
GROUP BY a, b)

How do I get LIKE and COUNT to return the number of rows less than a value not in the row?

For example:
SELECT COUNT(ID) FROM My_Table
WHERE ID <
(SELECT ID FROM My_Table
WHERE ID LIKE '%4'
ORDER BY ID LIMIT 1)
My_Table:
X ID Y
------------------------
| | A1 | |
------------------------
| | B2 | |
------------------------
| | C3 | |
------------------------ -----Page 1
| | D3 | |
------------------------
| | E3 | |
------------------------
| | F5 | |
------------------------ -----Page 2
| | G5 | |
------------------------
| | F6 | |
------------------------
| | G7 | | -----Page 3
There is no data ending in 4 but there still are 5 rows that end in something less than "%4".
However, in this case were there is no match, so SQLite only returns 0
I get it is not there but how do I change this behavior to still return number of rows before it, as if it was there?
Any suggestions?
Thank You.
SELECT COUNT(ID) FROM My_Table
WHERE ID < (SELECT ID FROM My_Table
WHERE SUBSTRING(ID, 2) >= 4
ORDER BY ID LIMIT 1)
Assuming there is always one letter before the number part of the id field, you may want to try the following:
SELECT COUNT(*) FROM my_table WHERE CAST(substr(id, 2) as int) <= 4;
Test case:
CREATE TABLE my_table (id char(2));
INSERT INTO my_table VALUES ('A1');
INSERT INTO my_table VALUES ('B2');
INSERT INTO my_table VALUES ('C3');
INSERT INTO my_table VALUES ('D3');
INSERT INTO my_table VALUES ('E3');
INSERT INTO my_table VALUES ('F5');
INSERT INTO my_table VALUES ('G5');
INSERT INTO my_table VALUES ('F6');
INSERT INTO my_table VALUES ('G7');
Result:
5
UPDATE: Further to the comment below, you may want to consider using the ltrim() function:
The ltrim(X,Y) function returns a string formed by removing any and all characters that appear in Y from the left side of X.
Example:
SELECT COUNT(*)
FROM my_table
WHERE CAST(ltrim(id, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ') as int) <= 4;
Test case (adding to the above):
INSERT INTO my_table VALUES ('ABC1');
INSERT INTO my_table VALUES ('ZWY2');
New Result:
7
In MySQL that would be:
SELECT COUNT(ID)
FROM My_Table
WHERE ID <
(
SELECT id
FROM (
SELECT ID
FROM My_Table
WHERE ID LIKE '%4'
ORDER BY
ID
LIMIT 1
) q
UNION ALL
SELECT MAX(id)
FROM mytable
LIMIT 1
)