SQL to get unique rows in Netezza DB - sql

I have a table with rows like:
id group_name_code
1 999
2 16
3 789
4 999
5 231
6 999
7 349
8 16
9 819
10 999
11 654
But I want output rows like this:
id group_name_code
1 999
2 16
3 789
4 231
5 349
6 819
7 654
Will this query help?
select id, distinct(group_name_code) from group_table;

You seem to want:
Distinct values for group_name_code and a sequential id ordered by minimum id per set of group_name_code.
Netezza has the DISTINCT key word, but not DISTINCT ON () (Postgres feature):
https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_select.html
You could:
SELECT DISTINCT group_name_code FROM group_table;
No parentheses, the DISTINCT key word does not require parentheses.
But you would not get the sequential id you show with this.
There are "analytic functions" a.k.a. window functions:
https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.dbu.doc/c_dbuser_overview_analytic_funcs.html
And there is also row_number():
https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_functions.html
So this should work:
SELECT row_number() OVER (ORDER BY min(id)) AS new_id, group_name_code
FROM group_table
GROUP BY group_name_code
ORDER BY min(id);
Or use a subquery if Netezza should not allow to nest aggregate and window functions:
SELECT row_number() OVER (ORDER BY id) AS new_id, group_name_code
FROM (
SELECT min(id) AS id, group_name_code
FROM group_table
GROUP BY group_name_code
) sub
ORDER BY id;

If you do not mind losing data on id you can use an aggregate function on that column and group by group_name_code:
select min(id) as id, group_name_code
from group_table
group by group_name_code
order by id;
This way you pull unique values for group_name_code and the lowest id for each code.
If you don't need id in your output (it seems like this doesn't correspond to input table) and just want the unique codes, try this:
select group_name_code
from p
group by group_name_code
order by id;
This gets the codes you want. If you want id to be the rownumber that will depend on which RDBMS you are using

you can get that result using CTE, replace #t with you table name and value with group_name_code
; WITH tbl AS (
SELECT DISTINCT value FROM #t
)
SELECT ROW_NUMBER() OVER (ORDER BY value) AS id,* FROM tbl

Related

Select row in group with largest value in particular column postgres

I have a database table which looks like this.
id account_id action time_point
3 234 delete 100
1 656 create 600
1 4435 update 900
3 645 create 50
I need to group this table by id and select particular row where time_point has a largest value.
Result table should look like this:
id account_id action time_point
3 234 delete 100
1 4435 update 900
Thanks for help,
qwew
In Postgres, I would recommend distinct on to solve this top 1 per group problem:
select distinct on (id) *
from mytable
order by id, time_point desc
However, this does not allow possible to ties. If so, rank() is a better solution:
select *
from (
select t.*, rank() over(partition by id order by time_point desc) rn
from mytable t
) t
where rn = 1
Or, if you are running Postgres 13:
select *
from mytable t
order by rank() over(partition by id order by time_point desc)
fetch first row with ties
check this.
select * from x
where exists (
select 1 from x xin
where xin.id = x.id
having max(time_point) = time_point
);

Select by greatest sum, but without the sum in the result

I need to select the top score of all combined attempts by a player and I need to use a WITH clause.
create table scorecard(
id integer primary key,
player_name varchar(20));
create table scores(
id integer references scorecard,
attempt integer,
score numeric
primary key(id, attempt));
Sample Data for scorecard:
id player_name
1 Bob
2 Steve
3 Joe
4 Rob
Sample data for scores:
id attempt score
1 1 50
1 2 45
2 1 10
2 2 20
3 1 40
3 2 35
4 1 0
4 2 95
The results would simply look like this:
player_name
Bob
Rob
But would only be Bob if Rob had scored less than 95 total. I've gotten so far as to have the name and the total scores that they got in two columns using this:
select scorecard.player_name, sum(scores.score)
from scorecard
left join scores
on scorecard.id= scores.id
group by scorecard.name
order by sum(scores.score) desc;
But how do I just get the names of the highest score (or scores if tied).
And remember, it should be using a WITH clause.
Who ever told you to "use a WITH clause" was missing a more efficient solution. To just get the (possibly multiple) winners:
SELECT c.player_name
FROM scorecard c
JOIN (
SELECT id, rank() OVER (ORDER BY sum(score) DESC) AS rnk
FROM scores
GROUP BY 1
) s USING (id)
WHERE s.rnk = 1;
A plain subquery is typically faster than a CTE. If you must use a WITH clause:
WITH top_score AS (
SELECT id, rank() OVER (ORDER BY sum(score) DESC) AS rnk
FROM scores
GROUP BY 1
)
SELECT c.player_name
FROM scorecard c
JOIN top_score s USING (id)
WHERE s.rnk = 1;
SQL Fiddle.
You could add a final ORDER BY c.player_name to get a stable sort order, but that's not requested.
The key feature of the query is that you can run a window function like rank() over the result of an aggregate function. Related:
Postgres window function and group by exception
Get the distinct sum of a joined table column
Can try something like follows.
With (SELECT id, sum(score) as sum_scores
FROM scores
group by id) as sumScoresTable,
With (SELECT max(score) as max_scores
FROM scores
group by id) as maxScoresTable
select player_name
FROM scorecard
WHERE scorecard.id in (SELECT sumScoresTable.id
from sumScoresTable
where sumScoresTable.score = (select maxScoresTable.score from maxScoresTable)
Try this code:
WITH CTE AS (
SELECT ID, RANK() OVER(ORDER BY SumScore DESC) As R
FROM (
SELECT ID, SUM(score) AS SumScore
FROM scores
GROUP BY ID )
)
SELECT player_name
FROM scorecard
WHERE ID IN (SELECT ID FROM CTE WHERE R = 1)

Only the distinct values after a group by in SQL Server 2014

Here is the sample of the data:
ID Value NumPeriod
------------------------
1681642 596.8 2
1681642 596.8 3
1681663 445.4 2
1681663 445.4 3
1681688 461.9 3
1681707 282.2 3
1681724 407.1 3
1681743 467 2
1681743 467 3
1681767 502 3
I want to group by the [ID] and take only the distinct values of [Value] within each group and take the "first" distinct [Value] according to [NumPeriod]. So the result would look something this:
ID Value NumPeriod
-------------------------
1681642 596.8 2
1681663 445.4 2
1681688 461.9 3
1681707 282.2 3
1681724 407.1 3
1681743 467 2
1681767 502 3
So I though something like this would work, but no luck:
select
ID, distinct(Value), NumPeriod
from
MyTable
group by
ID, Value, NumPeriod
order by
ID, NumPeriod
Any help would be appreciated. Thanks!
You can use a ranking function and a CTE:
WITH CTE AS
(
SELECT ID, Value, NumPeriod,
RN = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY NumPeriod ASC)
FROM MyTable
)
SELECT ID, Value, NumPeriod
FROM CTE
WHERE RN = 1
ORDER BY ID, Value
I think all you have to change is where you call distinct.
Try this:
select distinct ID, Value, NumPeriod
from MyTable
group by ID, Value, NumPeriod
order by ID, NumPeriod

How to get only one record for each duplicate rows of the id in oracle?

suppose i have this table:
group_id | image | image_id |
-----------------------------
23 blob 1
23 blob 2
23 blob 3
21 blob 4
21 blob 5
25 blob 6
25 blob 7
how to get results of only 1 of each group id? in this case,there may be multiple images for one group id, i just want one result of each group_id
i tried distinct but i will only get group_id. max for image also would not work.
There are no standard aggregate functions in Oracle that would work with BLOBs, so GROUP BY solutions won't work.
Try this one based on ROW_NUMBER() in a sub-query.
SELECT inn.group_id, inn.image, inn.image_id
FROM
(
SELECT t.group_id, t.image, t.image_id,
ROW_NUMBER() OVER (PARTITION BY t.group_id ORDER BY t.image_id) num
FROM theTable t
) inn
WHERE inn.num = 1;
The above should return the first (based on image_id) row for each group.
SQL Fiddle
SELECT group_id, image, image_id
FROM a_table
WHERE (group_id, image_id) IN
(
SELECT group_id, MIN(image_id)
FROM a_table
GROUP BY
group_id
)
;
select * from
(select t1.*,
ROW_NUMBER() OVER (PARTITION BY group_id ORDER BY group_id desc) as seqnum
from tablename t1)
where seqnum=1;

Getting rows with duplicate column values

I tried this with solutions avaialble online, but none worked for me.
Table :
Id rank
1 100
1 100
2 75
2 45
3 50
3 50
I want Ids 1 and 3 returned, beacuse they have duplicates.
I tried something like
select * from A where rank in (
select rank from A group by rank having count(rank) > 1
This also returned ids without any duplicates. Please help.
Try this:
select id from table
group by id, rank
having count(*) > 1
select id, rank
from
(
select id, rank, count(*) cnt
from rank_tab
group by id, rank
having count(*) > 1
) t
This general idea should work:
SELECT id
FROM your_table
GROUP BY id
HAVING COUNT(*) > 1 AND COUNT(DISTINCT rank) = 1
In plain English: get every id that exists in multiple rows, but all these rows have the same value in rank.
If you want ids that have some duplicated ranks (but not necessarily all), something like this should work:
SELECT id
FROM your_table
GROUP BY id
HAVING COUNT(*) > COUNT(DISTINCT rank)