Select greatest number of unique pairs from table - sql

I have the following table:
| a | b |
|---|---|
| 2 | 4 | x
| 2 | 5 |
| 3 | 1 | x
| 6 | 4 |
| 6 | 5 | x
| 7 | 5 |
| 7 | 4 |
|---|---|
I want to select the greatest number of unique pairs possible where neither a or b are repeated. So the entries with x's next to them should be what the select would grab. Any ideas how to do this?
Currently I have some SQL that will do the opposite, select those that aren't unique and delete them but it has not been working the way I want it to. This is the SQL I have right now, but I think I'm going to scrap it and work at it from the angle I have stated above.
delete t
from #temp2 t
where (exists(select * from #temp2
where (b = t.b
and a < t.a))
or exists(select * from #temp2
where a = t.a
and (b < t.b and ) and
(not exists(select * from #temp2
where b = t.b
and a < t.a)
or not exists(select * from #temp2
where a = t.a
and b < t.b))
Thanks!

I'm assuming here that being non-unique and unique are mutually exclusive and will encompass all records in your table. If so, use your existing script, write it to a CTE, then join to the CTE from your source table selecting those records not in the CTE.
With Non_Unique_Records as (
--Insert your existing script here
)
Select t.a
, t.b
From #temp2 t
Left Outer Join Non_Unique_Records CTE
on t.a = CTE.a
and t.b = CTE.b
Where CTE.b is null
Then just delete the records that the Select statement returns.

Related

Access VBA: Select only multiple values

Say, I have a table that looks like this:
ID | PNo | MM | CP |
---|-----|------|----|
1 | 13 | True | 4 |
2 | 92 | True | 3 |
3 | 1 | True | 3 |
4 | 13 | False| 2 |
5 | 13 | True | 3 |
6 | 1 | True | 3 |
I want to go through all PNos and compare all rows with that PNo and only select those that have different values in field MM.
My plan was to create a table with the distinct values of PNo, iterate through that table by using the usual record set and write an SQL query for each PNo.
Now my problem is the construction of the SQL query.
I can select all rows with Table.PNo = rs("PNo") but I have no idea how to formulate the query to catch the rows with varying values.
You can use a subquery:
Select *
From YourTable
Where PNo IN
(Select T.PNo
From YourTable
Group By PNo, MM
Having Count(*) = 2)
I think this should work.
This will create a cartesian product on y our PNo field. i.e. Every record connected to every record (but just on that PNo).
SELECT *
FROM Table1 T1 INNER JOIN Table1 T2 ON T1.PNo = T2.PNo
You'll end up with 9 instances of PNo 13, 4 of 1 and 1 of 92. Now we just want to return the ones where MM is different, so add that to the WHERE clause.
SELECT *
FROM Table1 T1 INNER JOIN Table1 T2 ON T1.PNo = T2.PNo
WHERE T1.MM <> T2.MM
ORDER BY T1.ID
This will return four records. PNo 1 and 92 will have vanished as the MM result was the same for those. ID number 4 will be returned twice as the MM value is different from that in ID 1 and ID 5.
To remove the duplicate value you could then use DISTINCT:
SELECT DISTINCT T1.ID, T1.PNo, T1.MM, T1.CP
FROM Table1 T1 INNER JOIN Table1 T2 ON T1.PNo = T2.PNo
WHERE T1.MM <> T2.MM
ORDER BY T1.ID
Note: One of the differences between the answer given by #Jonathan and mine is that his query is updateable and mine isn't.
The following should do what you want:
SELECT * FROM MyTable WHERE PNo in
(SELECT t.PNo FROM MyTable t
INNER join MyTable f
ON t.PNo = f.PNo
WHERE t.MM = true and f.MM = false)
The inner join ensures that only those PNos that have both MM false and MM true are included.

Discard rows which is not MAX in that group

I have data like this:
a b c
-------|--------|--------
100 | 3 | 50
100 | 4 | 60
101 | 3 | 70
102 | 3 | 70
102 | 4 | 80
102 | 5 | 90
a : key
b : sub_id
c : value
I want to NULL the c row for each element which has non-max a column.
My resulting table must look like:
a b c
-------|--------|--------
100 | 3 | NULL
100 | 4 | 60
101 | 3 | 70
102 | 3 | NULL
102 | 4 | NULL
102 | 5 | 90
How can I do this with an SQL Query?
#UPDATE
My relational table has about a billion rows. Please remind that while providing an answer. I cannot wait couple of hours or 1 day for executing.
Updated after the requirement was changed to "update the table":
with max_values as (
select a,
b,
max(c) over (partition by a) as max_c
from the_table
)
update the_table
set c = null
from max_values mv
where mv.a = the_table.a
and mv.b = the_table.b
and mv.max_c <> the_table.c;
SQLFiddle: http://sqlfiddle.com/#!15/1e739/1
Another possible solution, which might be faster (but you need to check the execution plan)
update the_table t1
set c = null
where exists (select 1
from the_table t2
where t2.a = t1.a
and t2.b = t2.b
and t1.c < t2.c);
SQLFiddle: http://sqlfiddle.com/#!15/1e739/2
But with "billion" rows there is no way this is going to be really fast.
DECLARE #TAB TABLE (A INT,B INT,C INT)
INSERT INTO #TAB VALUES
(100,3,50),
(100,4,60),
(101,3,70),
(102,3,70),
(102,4,80),
(102,5,90)
UPDATE X
SET C = NULL
FROM #TAB X
LEFT JOIN (
SELECT A,MAX(C) C
FROM #TAB
GROUP BY A) LU ON X.A = LU.A AND X.C = LU.C
WHERE LU.A IS NULL
SELECT * FROM #TAB
Result:
this approach will help you
How about this formulation?
select a, b,
(case when c = max(c) over (partition by a) then c end) as c
from table t;
I'm not sure if you can get this faster. An index on a, c might help.
SELECT a, b,
CASE ROW_NUMBER() OVER (PARTITION BY a ORDER BY b DESC) WHEN 1 THEN с END c
FROM mytable

Postgresql: Insert the cartesian product of two or more sets

as definition: The cartesian product of two sets is the set of all possible pairs of these sets, so {A,B} x {a,b} = {(A,a),(A,b),(B,a),(B,b)}.
Now i want to insert such a cartesian product into a database table (each pair as a row). It is intended to fill the table with default values for each pair, so the data, i.e. the two sets, are not present in the database at this point.
Any idea how to achieve this with postgresql?
EDIT :
With the help of Grzegorz Szpetkowski's answer I was able to produce a query that does what I want to achieve, but it really isn't the prettiest one. Suppose I want to insert the cartesian product of the sets {1,2,3} and {'A','B','C'}.
INSERT INTO "Test"
SELECT * FROM
(SELECT 1 UNION SELECT 2 UNION SELECT 3) P
CROSS JOIN
(SELECT 'A' UNION SELECT 'B' UNION SELECT 'C') Q
Is there any better way to do this?
EDIT2 :
Accepted answer is fine, but i found another version which might be appropriate if it gets more complex:
CREATE TEMP TABLE "Numbers" (ID integer) ON COMMIT DROP;
CREATE TEMP TABLE "Chars" (Char character varying) ON COMMIT DROP;
INSERT INTO "Numbers" (ID) VALUES (1),(2),(3);
INSERT INTO "Chars" (Char) VALUES ('A'),('B'),('C');
INSERT INTO "Test"
SELECT * FROM
"Numbers"
CROSS JOIN
"Chars";
I am not sure if this really answers your question, but in PostgreSQL there is CROSS JOIN defined as:
For every possible combination of rows from T1 and T2 (i.e., a
Cartesian product), the joined table will contain a row consisting of
all columns in T1 followed by all columns in T2. If the tables have N
and M rows respectively, the joined table will have N * M rows.
FROM T1 CROSS JOIN T2 is equivalent to FROM T1, T2. It is also
equivalent to FROM T1 INNER JOIN T2 ON TRUE (see below).
EDIT:
One way is to use VALUES Lists (note that in fact you have no order, use ORDER BY clause to get some ordering):
SELECT N AS number, L AS letter FROM
(VALUES (1), (2), (3)) a(N)
CROSS JOIN
(VALUES ('A'), ('B'), ('C')) b(L);
Result:
number | letter
--------+--------
1 | A
1 | B
1 | C
2 | A
2 | B
2 | C
3 | A
3 | B
3 | C
(9 rows)
BTW:
For more numbers I believe it's handle to use generate_series function, e.g.:
SELECT n AS number, chr(ascii('A') + L - 1) AS letter
FROM
generate_series(1, 5) N
CROSS JOIN
generate_series(1, 5) L
ORDER BY N, L;
Result:
number | letter
--------+--------
1 | A
1 | B
1 | C
1 | D
1 | E
2 | A
2 | B
2 | C
2 | D
2 | E
3 | A
3 | B
3 | C
3 | D
3 | E
4 | A
4 | B
4 | C
4 | D
4 | E
5 | A
5 | B
5 | C
5 | D
5 | E
(25 rows)

sql problem,challenge

I want to get
id a b c
--------------------
1 1 100 90
6 2 50 100
...from:
id a b c
--------------------
1 1 100 90
2 1 300 50
3 1 200 20
4 2 200 30
5 2 300 70
6 2 50 100
It's the row with the smallest b group by a.
How to do it with sql?
EDIT
I thought it can be achieved by
select * from table group by a having min(b);
which I found later it's wrong.
But is it possible to do it with having statement?
I'm using MySQL
SELECT t1.*
FROM mytable t1
LEFT OUTER JOIN mytable t2
ON (t1.a=t2.a AND t1.b>t2.b)
WHERE t2.a IS NULL;
This works because there should be no matching row t2 with the same a and a lesser b.
update: This solution has the same issue with ties that other folks have identified. However, we can break ties:
SELECT t1.*
FROM mytable t1
LEFT OUTER JOIN mytable t2
ON (t1.a=t2.a AND (t1.b>t2.b OR t1.b=t2.b AND t1.id>t2.id))
WHERE t2.a IS NULL;
Assuming for instance that in the case of a tie, the row with the lower id should be the row we choose.
This doesn't do the trick:
select * from table group by a having min(b);
Because HAVING MIN(b) only tests that the least value in the group is not false (which in MySQL means not zero). The condition in a HAVING clause is for excluding groups from the result, not for choosing the row within the group to return.
In MySQL:
select t1.* from test as t1
inner join
(select t2.a, min(t2.b) as min_b from test as t2 group by t2.a) as subq
on subq.a=t1.a and subq.min_b=t1.b;
Here is the proof:
mysql> create table test (id int unsigned primary key auto_increment, a int unsigned not null, b int unsigned not null, c int unsigned not null) engine=innodb;
Query OK, 0 rows affected (0.55 sec)
mysql> insert into test (a,b,c) values (1,100,90), (1,300,50), (1,200,20), (2,200,30), (2,300,70), (2,50,100);
Query OK, 6 rows affected (0.39 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> select * from test;
+----+---+-----+-----+
| id | a | b | c |
+----+---+-----+-----+
| 1 | 1 | 100 | 90 |
| 2 | 1 | 300 | 50 |
| 3 | 1 | 200 | 20 |
| 4 | 2 | 200 | 30 |
| 5 | 2 | 300 | 70 |
| 6 | 2 | 50 | 100 |
+----+---+-----+-----+
6 rows in set (0.00 sec)
mysql> select t1.* from test as t1 inner join (select t2.a, min(t2.b) as min_b from test as t2 group by t2.a) as subq on subq.a=t1.a and subq.min_b=t1.b;
+----+---+-----+-----+
| id | a | b | c |
+----+---+-----+-----+
| 1 | 1 | 100 | 90 |
| 6 | 2 | 50 | 100 |
+----+---+-----+-----+
2 rows in set (0.00 sec)
Use:
SELECT DISTINCT
x.*
FROM TABLE x
JOIN (SELECT t.a,
MIN(t.b) 'min_b'
FROM TABLE T
GROUP BY t.a) y ON y.a = x.a
AND y.min_b = x.b
You're right. select min(b), a from table group by a. If you want the entire row, then you've use analytics function. That depends on database s/w.
It depends on the implementation, but this is usually faster than the self-join method:
SELECT id, a, b, c
FROM
(
SELECT id, a, b, c
, ROW_NUMBER() OVER(PARTITION BY a ORDER BY b ASC) AS [b IN a]
) As SubqueryA
WHERE [b IN a] = 1
Of course it does require that you SQL implementation be fairly up-to-date with the standard.

How to merge MySQL queries with different column counts?

Definitions:
In the results, * denotes an empty column
The data in the tables is such that every field in the table has the value Fieldname + RowCount (so column 'a' in row 1 contains the value 'a1').
2 MySQL Tables
Table1
Fieldnames: a,b,c,d
Table2
Fieldnames: e,f,g,h,i,j
Task:
I want to get the first 4 rows from each of the tables.
Standalone Queries
SELECT Table1.* FROM Table1 WHERE 1 LIMIT 0,4 -- Colcount 4
SELECT Table2.* FROM Table2 WHERE 1 LIMIT 0,4 -- Colcount 6
A simple UNION of the queries fails because the two parts have different column counts.
Version1: add two empty fields to the first query
SELECT Table1.*,'' AS i,'' AS j FROM Table1 WHERE 1 LIMIT 0,4
UNION
SELECT Table2.* FROM Table2 WHERE 1 LIMIT 0,4
So I will get the following fields in the result set:
a,b,c,d,i,j
a1,b1,c1,d1,*,*,
a2,b2,c2,d2,*,*,
....
....
e1,f1,g1,h1,i1,j1
e2,f2,g2,h2,i2,j2
The problem is that the field names of Table2 are overridden by Table1.
Version2 - shift columns by using empty fields:
SELECT Table1.*,'','','','','','' FROM Table1 WHERE 1 LIMIT 0,4
UNION
SELECT '','','','',Table2.* FROM Table2 WHERE 1 LIMIT 0,4
So I will get the following fields in the result set:
a,b,c,d,i,j
a1,b1,c1,d1,*,*,*,*,*,*,
a2,b2,c2,d2,*,*,*,*,*,*,
....
....
*,*,*,*,e1,f1,g1,h1,i1,j1
*,*,*,*,e2,f2,g2,h2,i2,j2
....
....
Problem is solved but I get many empty fields.
Is there a known performance issue?
How do you solve this task?
Is there a best practice to solve this issue?
The output from a query should be a table, which is a set of rows, each row with the same set of column names and types. (There are some DBMS that support ragged rows - with different sets of columns, but that is not a mainstream feature.)
You have to decide how to handle two sets of four rows with different sets of columns in the two sets.
The simplest option, usually, is to do the two standalone queries. The two result sets are not comparable, and should not be conflated.
If you choose your Version 1, then you should decide which set of column names is appropriate, or create a composite set of names using 'AS x' column aliases.
If you choose your Version 2, then you should probably name the trailing columns of the first clause of the UNION; at the moment, they all have no name:
SELECT Table1.*, '' AS e, '' AS f, '' AS g, '' AS h, '' AS i, '' AS j
FROM Table1 WHERE 1 LIMIT 0,4
UNION
SELECT '' AS a, '' AS b, '' AS c, '' AS d, Table2.*
FROM Table2 WHERE 1 LIMIT 0,4
(The AS comments in the second are redundant, but self-consistent; the two halves of the UNION have the same column headings explicitly.)
Except that you have provided empty strings instead of NULL, the notation you have chosen corresponds to an 'OUTER UNION'. You can find occasional references to it in selected parts of the literature (E F Codd in the RM/V2 book; C J Date in critiques of all things OUTER). SQL 1999 provided it as a UNION JOIN; SQL 2003 removed UNION JOIN (that's pretty unusual - and damning of the feature).
I'd use two separate queries.
The thing that seems most sensible is your "version 2", except using NULLs instead of empty strings.
This took some thinking, and then some MySQL-specific workarounds. The concept is this: A Join will produce the table structure you want. What you really want is a full outer join where no row 'matches.' To do this, we need a reliable way to ensure that rows don't match, and then, we have to UNION and LEFT JOIN and a RIGHT JOIN, to overcome MySQL's limitation of no FULL OUTER JOINs.
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE A (a int, b int, c int, d int);
CREATE TABLE B (e int, f int, g int, h int, i int, j int);
INSERT INTO A VALUES (1,1,1,1),(2,2,2,2);
INSERT INTO B VALUES (8,8,8,8,8,8),(9,9,9,9,9,9);
Query 1:
SELECT * FROM
(SELECT * FROM (SELECT "TableA" as unique_field) as Ax CROSS JOIN A) as A
LEFT JOIN
(SELECT * FROM (SELECT "TableB" as unique_field) as Bx CROSS JOIN B) AS B
on A.unique_field=B.unique_field
UNION
SELECT * FROM
(SELECT * FROM (SELECT "TableA" as unique_field) as Ax CROSS JOIN A) as A
RIGHT JOIN
(SELECT * FROM (SELECT "TableB" as unique_field) as Bx CROSS JOIN B) AS B
on A.unique_field=B.unique_field
Results:
| unique_field | a | b | c | d | unique_field | e | f | g | h | i | j |
|--------------|--------|--------|--------|--------|--------------|--------|--------|--------|--------|--------|--------|
| TableA | 1 | 1 | 1 | 1 | (null) | (null) | (null) | (null) | (null) | (null) | (null) |
| TableA | 2 | 2 | 2 | 2 | (null) | (null) | (null) | (null) | (null) | (null) | (null) |
| (null) | (null) | (null) | (null) | (null) | TableB | 8 | 8 | 8 | 8 | 8 | 8 |
| (null) | (null) | (null) | (null) | (null) | TableB | 9 | 9 | 9 | 9 | 9 | 9 |
This syntax: SELECT * FROM (SELECT 1 as unique_field) as Ax CROSS JOIN A) as A is more easily understood as (SELECT 1 as unique_field, * FROM A) AS A, but, MySQL doesn't allow a * to follow a field specification.