SQL query: same rows - sql

I'm having trouble finding the right sql query. I want to select all the rows with a unique x value and if there are rows with the same x value, then I want to select the row with the greatest y value. As an example I've put a part of my database below.
ID x y
1 2 3
2 1 5
3 4 6
4 4 7
5 2 6
The selected rows should then be those with ID 2, 4 and 5.
This is what I've got so far
SELECT *
FROM base
WHERE x IN
(
SELECT x
FROM base
HAVING COUNT(*) > 1
)
But this only results in the rows that occur more than once. I've added the tags R, postgresql and sqldf because I'm working in R with those packages.

Here is a typical way to formulate the query in ANSI SQL:
select b.*
from base b
where not exists (select 1
from base b2
where b2.x = b.x and
b2.y > b.y
);
In Postgres, you would use distinct on for performance:
select distinct on (x) b.*
from base b
order by x, y desc;

You could try this query:
select x, max(y) from base group by x;
And, if you'd also like the id column in the result:
select base.*
from base join (select x, max(y) from base group by x) as maxima
on (base.x = maxima.x and base.y = maxima.max);

Example:
CREATE TABLE tmp(id int, x int ,y int);
INSERT INTO .....
test=# SELECT x, max(y) AS y FROM tmp GROUP BY x;
x | y
---+---
4 | 7
1 | 5
2 | 6

Related

SQL: Efficient way to get group by results including all table columns

Let's consider a simple table below.
id
code
marks
grade
1
X
100
A
2
Y
120
B
3
Z
130
A
4
X
120
C
5
Y
100
A
6
Z
110
B
7
X
150
A
8
X
140
C
Goal: Get maximum marks for each grade, return all the columns.
id
code
marks
grade
7
X
150
A
2
Y
120
B
8
X
140
C
This is very simple if I don't want id and code column
select grade, max(marks)
from table
group by grade;
What could be the most efficient query to get id and code column in the above query?
I tried something like this which didn't work
select * from table t
inner join
(select grade, max(marks)
from table
group by grade) a
on a.grade=t.grade;
In Postgres the most efficient way for this kind of query is to use (the proprietary) distinct on ()
select distinct on (grade) *
from the_table t
order by grade, marks desc;
Are you looking for a correlated subquery?
select t.*
from t
where t.marks = (select max(t2.marks) from t t2 where t2.grade = t.grade);

Merge two tables and chain id fields in order

I'm looking for a way to merge two tables (or more) and modify/order their numeric id. To put it simply here is what I want to do schematically :
Table example 1 :
Id
Field
4
x
1
x
5
x
3
x
2
x
Table example 2 :
Id
Field
1
x
3
x
5
x
2
x
4
x
Expected result (modify table 1 as 1-2-3-4-5 and table 2 as 6-7-8-9-10 THEN order both id by asc)
Id
Field
1
x
2
x
3
x
4
x
5
x
6
x
7
x
8
x
9
x
10
x
I was aiming for a union tables nested in a select row_number() over (order by id) but I don't really know how to modify table 2 as 6-7-8-9-10 before
Try using this example:
SELECT id, Field FROM t1
UNION ALL
SELECT (SELECT MAX(id) FROM t1) + ROW_NUMBER() OVER (ORDER BY id) AS id, Field
FROM t2
ORDER BY id
fiddle

How can I select rows corresponding to the unique pair of column values with the highest value of another column in PostgreSQL?

My table looks like this:
A
B
X
1
1
1
1
1
2
1
1
3
1
2
1
1
2
2
2
2
1
2
2
2
2
2
3
I need to select the row with the highest value in X column for each unique A, B pair.
The result would be:
A
B
X
1
1
3
1
2
2
2
2
3
I would recommend distinct on:
select distinct on (a, b) t.*
from t
order by a, b, x desc;
This allows you to select other columns from the rows, other than a, b, and x.
With an index on (a, b, x desc), this would typically be the fastest solution.
You can use the MAX aggregate function as follows:
select A, B, MAX(X) AS X
from YOUR_TABLE
group by A, B
That would work like that:
select * from a where x = (select max(x) from a)

Convert rows to columns by same column value in Postgres

I have a table like:
id name value
--------------------
1 x 100
1 y 200
1 z 300
2 x 10
2 y abc
2 z 001
3 x 1
...
--------------------
and I need to transform it into something like that:
id x y z
---------------------
1 100 200 300
2 10 abc 001
3 1 ...
---------------------
Names are determined. I could make multiple joins but I'm looking for a more elegant solution.
Use conditional aggregation which in Postgres uses the filter syntax:
select id,
max(value) filter (where name = 'x') as x,
max(value) filter (where name = 'y') as y,
max(value) filter (where name = 'z') as z
from t
group by id;
The additional module tablefunc provides variants of the crosstab() function, which is typically fastest:
SELECT *
FROM crosstab(
'SELECT id, name, value
FROM tbl
ORDER BY 1, 2'
) AS ct (id int, x text, y text, z text);
You seem to have a mix of numbers and strings in your value, so I chose text as output.
See:
PostgreSQL Crosstab Query

PLSQL or SSRS, How to select having all values in a group?

I have a table like this.
ID NAME VALUE
______________
1 A X
2 A Y
3 A Z
4 B X
5 B Y
6 C X
7 C Z
8 D Z
9 E X
And the query:
SELECT * FROM TABLE1 T WHERE T.VALUE IN (X,Z)
This query gives me
ID NAME VALUE
______________
1 A X
3 A Z
4 B X
6 C X
7 C Z
8 D Z
9 E X
But i want to see all values of names which have all params. So, only A and C have both X and Z values, and my desired result is:
ID NAME VALUE
______________
1 A X
2 A Y
3 A Z
6 C X
7 C Z
How can I get the desired result? No matter with sql or with reporting service. Maybe "GROUP BY ..... HAVING" clause will help, but I'm not sure.
By the way I dont know how many params will be in the list.
I realy appreciate any help.
The standard approach would be something like
SELECT id, name, value
FROM table1 a
WHERE name IN (SELECT name
FROM table1 b
WHERE b.value in (x,y)
GROUP BY name
HAVING COUNT(distinct value) = 2)
That would require that you determine how many values are in the list so that you can use a 2 in the HAVING clause if there are 2 elements, a 5 if there are 5 elements, etc. You could also use analytic functions
SELECT id, name, value
FROM (SELECT id,
name,
value,
count(distinct value) over (partition by name) cnt
FROM table1 t1
WHERE t1.value in (x,y))
WHERE cnt = 2
I prefer to structure these "sets within sets" of queries as an aggregatino. I find this is the most flexible approach:
select t.*
from t
where t.name in (select name
from t
group by name
having sum(case when value = 'X' then 1 else 0 end) > 0 and
sum9case when value = 'Y' then 1 else 0 end) > 0
)
The subquery for the in finds all names that have at least one X value and one Y value. Using the same logic, it is easy to adjust for other conditions (X and Y and Z,; X and Y but not Z and so on). The outer query just returns all the rows instead of the names.