SQL query to bring back max of multiple rows - sql

I have a table that looks something likes this:
A B C
1 2 2
3 4 6
1 2 3
3 4 5
3 4 4
1 2 1
What I need to do is given an array of two dimensional tuples representing A and B, I need to bring back the row for each that has the maximum C.
For example if my array was [(1,2)] then my resulting table should be this as for A=1 and B=2 then my maximum C is 3
A B C
1 2 3
If my array was [(1,2), (3,4)] then my resulting table should be this as for A=1 and B=2 then my maximum C is 3 and for A=3 and B=4 my maximum C is 6
A B C
1 2 3
3 4 6
I feel this can probably done by using an SQl subquery. Something along the lines of:
SELECT A, B, C
FROM my_table
WHERE my_array IN (SELECT A, B, C, MAX(C)
FROM my_table
WHERE **not sure what goes here**)
Is there a benefit to doing this in a single SQL query rather than doing an SQL query for each element of my array?

Use GROUP BY and MAX:
SELECT A, B, MAX(C) AS C
FROM tab
GROUP BY A, B
ORDER BY A, B;
EDIT:
If SQL Server you could use table variable:
DECLARE #my_array AS TABLE(A INT, B INT);
INSERT INTO #my_array(A, B) VALUES (1,2);
SELECT t.A, t.B, MAX(t.C) AS C
FROM tab t
JOIN #my_array ma
ON t.A = ma.A AND t.B = ma.B
GROUP BY t.A, t.B
ORDER BY t.A, t.B;
Rextester Demo

SELECT A, B, MAX(C) AS C FROM #temp where a=1 and b=2 GROUP BY A, B

A co-related subquery will do that:
select t1.*
from my_table t1
where (t1.a, t1.b) = in ( values (1,2), (3,4) )
and t1.c = (select max(c)
from my_table t2
where t1.a = t2.a
and t1.b = t2.b)
order by t1.a, t1.b;
You can avoid the co-related subquery if you want to. Sometimes they tend to be slower then equivalent solutions:
select t1.*
from my_table t1
where (t1.a, t1.b, t1.c) in (select t2.a, t2.b, max(t2.c)
from my_table t2
where (t2.a, t2.b) in (values (1,2), (3,4))
group by t2.a, t2.b)
order by t1.a, t1.b;
The part with values (1,2), (3,4) is you "array input".
The above is standard ANSI SQL.
Online example: http://rextester.com/QPY97515

Related

I have 3 columns [A, B, C] in my SQL table. I want to find table entries, where values in A is same, in B is same, but C is different

I have 3 columns [A, B, C] in my SQL table. I want to find table entries, where values in A is same, in B is same, but C is different.
A B C
1 2 3
4 5 6
*3 4 5*
*3 4 6*
*7 8 9*
6 1 2
*7 8 3*
I want to preferably get something like:
A B C
3 4 5
3 4 6
7 8 9
7 8 3
as my result.
Thanks :)
The core of the solution below is to aggregate over your table on both columns A and B, and then retain those groups having more than one C value. Then join your full table to this aggregation query to retain only the records you want.
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT A, B
FROM yourTable
GROUP BY A, B
HAVING COUNT(DISTINCT C) > 1
) t2
ON t1.A = t2.A AND t1.B = t2.B
ORDER BY
t1.A, t1.B;
Here is a demo in MySQL, though the above query should run on pretty much any other database with little modification.
Demo
Try this:
select A,B,C from (
select A,B,C, avg(C * 1.0) over (partition by A,B) [avg] from MY_TABLE
) a where [avg] <> C
The idea behind is simple, if all numbers within a set are equal, they also are equal to the average of the set.
This one should work too:
SELECT DISTINCT
t1.*
FROM
test t1
INNER JOIN
test t2 ON t2.a = t1.a
AND t2.b = t1.b
AND t2.c <> t1.c;
Here's a demo: link
I'm not sure about performance due to lots of duplicates being generated/truncated compared to other solutions, though.
No need to count or rank; you only want to check if at least one qualifying row EXISTS
select *
from thetable tt
where exists(
select * from thetable x
where x.a = tt.a and x.b = tt.b
and x.c <> tt.c
);

DB2 - Where Subquery

I currently have a problem in db2 with the following tables:
Table_1
A B C
1 2 1
2 1 2
3 2 2
4 1 1
Table_2
A
1
I want to select all table_1 records with a B or C greater than the greatest A from table_2. The result should be:
Query
A B C
1 2 1
2 1 2
3 2 2
which I currently achieve with this query:
select A, B, C
from Table_1
where B > (select max(A) from Table_2)
or C > (select max(A) from Table_2)
Is it possible to only issue one subselect in the where clause to improve performance?
I would write it as:
select A, B, C
from Table_1
where MAX(B,C) > (select max(A) from Table_2)
Note: untested as I have no DB2 database handy.
Here it is in an SQLfiddle in MySQL syntax: http://sqlfiddle.com/#!9/2f89c5/3
If that is what you need, then move the subquery to the FROM clause:
select t1.A, t1.B, t1.C
from Table_1 t1 cross join
(select max(A) as maxA from table_2) t2
where t1.B > t2.maxA or t1.C > t2.maxA;
Think vice versa: You want all table1 records for which not exists a table2 record with to great an A value:
select *
from table_1
where not exists
(
select *
from table_2
where table_2.a >= table_1.b
and table_2.a >= table_1.c
);
By the way: Having just one subquery is great for maintainabilty. A Change to it would have to be made in one place only. But as to speed: In your query the subquery is not correlated to the main query, so it will probably be run just once and its result applied to all checks. (It would be stupid, did the DBMS run the same subquery again and again for each record and column in table_1.)

Redshift Join VS. Union with Group By

Let's say I would like to pull the fields dim,a,b,c,d from 2 tables which one contains a,b and the other contains c,d.
I'm wondering if there's a preferred way (between the following) to do it - Performance wise:
1:
select t1.dim,a,b,c,d
from
(select dim,sum(a) as a,sum(b)as b from t1 group by dim)t1
join
(select dim,sum(c) as c,sum(d) as d from t2 group by dim)t2
on t1.dim=t2.dim;
2:
select dim,sum(a) as a,sum(b) as b,sum(c) as c,sum(d) as d
from
(
select dim,a,b,null as c, null as d from t1
union
select dim,null as a, null as b, c, d from t2
)a
group by dim
Of course when handling a large amount of data (5-30M records at the final query).
Thanks!
The first method filters would any dim values that are not in both tables. union is inefficient. So, neither is appealing.
I would go for:
select dim, sum(a) as a, sum(b) as b, sum(c) as c, sum(d) as d
from (select dim, a, b, null as c, null as d from t1
union all
select dim, null as a, null as b, c, d from t2
) a
group by dim;
You could also pre-aggregate the values in each subquery. Or use full outer join for the first method.

Reducing the list of results (SQL)

I stuck on an SQL statement since 2 days now and I hope you can help me with that.
The result of my select is a list with 4 attributes A, B, C and D (below is an example list of 5 datasets):
1. A=1 B=100 C=200 D=300
2. A=2 B=200 C=100 D=300
3. A=3 B=300 C=200 D=100
4. A=3 B=100 C=100 D=200
5. A=3 B=300 C=100 D=200
The list shall be reduced, so that every attribute A is in the list only once.
In the example above the dataset 1. and 2. should be in the list, because A=1 and A=2 exists only once.
For A=3 I have to build a query to identify the dataset, that will be in the final list. Some rules should apply:
Take the dataset with the highest value of B; if not distinct then
Take the dataset with the highest value of C; if not distinct then
Take the dataset with the highest value of D.
In the example above the dataset 3. should be taken.
The expected result is:
1.A=1 B=100 C=200 D=300
2.A=2 B=200 C=100 D=300
3.A=3 B=300 C=200 D=100
I hope you understand my problem. I've tried various versions of SELECT-statements with HAVING and EXISTS (or NOT EXISTS), but my SQL knowledge isn't enough.
Probably there is an easier way to solve this problem, but this one works:
CREATE TEMP TABLE TEST (
A INTEGER,
B INTEGER,
C INTEGER,
D INTEGER
);
insert into TEST values (1,1,1,1);
insert into TEST values (2,1,5,1);
insert into TEST values (2,2,1,1);
insert into TEST values (3,1,4,1);
insert into TEST values (3,2,1,4);
insert into TEST values (3,2,3,1);
insert into TEST values (3,3,1,5);
insert into TEST values (3,3,2,3);
insert into TEST values (3,3,2,7);
insert into TEST values (3,3,3,1);
insert into TEST values (3,3,3,2);
select distinct
t1.A,
t2.B as B,
t3.C as C,
t4.D as D
from TEST t1
join (select A ,MAX (B) as B from TEST group by A)t2 on t2.A=t1.A
join (select A, B, MAX(C) as C from TEST group by A,B)t3 on t3.A=t2.A and t3.B=t2.B
join (select A, B, C, MAX (D) as D from TEST group by A,B,C)t4 on t4.A=t3.A and t4.B=t3.B and t4.C=t3.C;
Result:
a b c d
1 1 1 1
2 2 1 1
3 3 3 2
Tested on IBM Informix Dynamic Server Version 11.10.FC3.
This type of prioritization query is most easily done with row_number(), but I don't think Informix supports that.
So, one method is to enumerate the rows using a correlated subquery:
select t.*
from (select t.*,
(select count(*)
from t t2
where (t2.b > t.b) or
(t2.b = t.b and t2.c > t.c) or
(t2.b = t.b and t2.c = t.c and t2.d > t.d)
) as NumGreater
from t
) t
where NumGreater = 0;
I have no idea about Informix but you can try. This works in Sql Server. May be it will also work in Informix:
select * from tablename t1
where id = (select first 1 id from tablename t2
where t2.A = t1.A order by B desc, C desc, D desc)
SELECT A, MAX(B) AS B, MAX(C) AS C, MAX(D) AS D
FROM table_name
GROUP BY A

Where one or another column exists in a sub select

I'm looking to do something like this:
SELECT a, b, c, d FROM someTable WHERE
WHERE a in (SELECT testA FROM otherTable);
Only I want to be able to test if 2 columns exist in a sub select of 2 columns.
SELECT a, b, c, d FROM someTable WHERE
WHERE a OR b in (SELECT testA, testB FROM otherTable);
We are using MS SQL Server 2012
Try this
SELECT a, b, c, d
FROM someTable WHERE
WHERE a IN (SELECT testA FROM otherTable)
OR b IN (SELECT testB FROM otherTable)
or
SELECT a, b, c, d
FROM someTable WHERE
WHERE EXISTS
(SELECT NULL
FROM otherTable
WHERE testA = a OR testB = a
OR testA = b OR testB = b)
UPDATE:
Maybe you need to add index on testB column, if you have bad performance.
Also another option to use CROSS APPLY for MS SQL
SELECT a, b, c, d
FROM someTable ST
CROSS APPLY (
SELECT 1
FROM otherTable OT
WHERE OT.testA = ST.a OR OT.testB = ST.b
)
If any of this won't work, try using UNION. Mostly UNION gives better performance than OR
SELECT a, b, c, d
FROM someTable WHERE
WHERE a IN (SELECT testA FROM otherTable)
UNION
SELECT a, b, c, d
FROM someTable WHERE
WHERE b IN (SELECT testB FROM otherTable)
UPDATE 2:
For further reading on OR and UNION differences
Why is UNION faster than an OR statement
Try this..
SELECT a, b, c, d
FROM someTable
WHERE Exists
(
SELECT 1
FROM otherTable
Where a = testA OR b = testB
)
If I'm understanding your question correctly, LEFT JOIN is probably the way to go here:
SELECT a, b, c, d
FROM TableA ta
LEFT JOIN TableB tb
ON ta.a = tb.a
AND ta.b = tb.b
WHERE tb.a IS NOT NULL
AND tb.c IS NOT NULL
You could also use UNION and INNER JOIN:
SELECT a, b, c, d
FROM someTable
INNER JOIN OtherTable OT on someTable.B = OT.testB
UNION
SELECT a, b, c, d
FROM someTable
INNER JOIN OtherTable OT ON someTable.A= OT.testA
Note that the JOIN approach should be orders of magnitude faster if you have an index on the column
Joins seems to be one option, have you thought about using them with a Union?
SELECT a, b, c, d
FROM someTable
INNER JOIN OtherTable OT on someTable.B = OT.testB
UNION
SELECT a, b, c, d
FROM someTable
INNER JOIN OtherTable OT ON someTable.A= OT.testA