How to use 'DISTINCT ON' to query last row of duplicate rows? - sql

I came across this tutorial on DISTINCT ON () query. To achieve the output, the following query was used:
SELECT
DISTINCT ON
(bcolor) bcolor, fcolor
FROM
t1
ORDER BY
bcolor,
fcolor;
The
is illustrated here where the top row of the duplicated rows are returned. However, is there a way to return the bottom of the duplicated rows instead? My use case is that multiple applications/registrations might enter and I want to query all the most recent distinct applications.
Appreciate your time, cheers!

I think you just want a descending sort on the second key:
SELECT DISTINCT ON (bcolor) bcolor, fcolor
FROM t1
ORDER BY bcolor, fcolor DESC;

Assuming you have a date or id columns that increments to indicate what the most recent row is, simply order by that column and add DESC to specify ordering by descending order.
SELECT
DISTINCT ON
registration_date, student_id, student_name
FROM
t1
ORDER BY
registration_date desc,
student_id,
student_name;

Related

How to pick first record from the duplicates, With only duplicate column values

Here is the situation where I have a table in bigquery like following.
As in the table we have record 1 and 3 with the same id but different first_name (Say the person with the id one changed his first_name) all other fields are same in both of the records (1 and 3) Now I need to select one records out of those 2 how can I do that. I tried self join but that is discarding both of the records, group_by will not work because the records is not duplicate only the Id is duplicate same with the distinct.
Thanks!!!!
The query I am using right now is
select * from table t group by 1,2,3,4,5;
You Can use ROW_NUMBER function to assign row numbers to each of your records in the table.
select *
from(
select *, ROW_NUMBER() OVER(PARTITION BY t.id) rn
from t)
Where rn = 1
ROW_NUMBER does not require the ORDER BY clause. Returns the sequential row ordinal (1-based) of each row for each ordered partition. If the ORDER BY clause is unspecified then the result is non-deterministic.
If you have record created date or modified dates you can use those in the ORDER BY clause to alway pick up the latest records.
SQL tables represent unordered sets. There is no first row unless you have a column that specifies the ordering. Let me assume you have such a column.
If you want a particular row, you can use aggregation with an order by:
select array_agg(t order by ? asc limit 1)[ordinal(1)].*
from t
group by id;
? is the column that specifies the ordering.
You can also leave out the order by:
select array_agg(t limit 1)[ordinal(1)].*
from t
group by id;

Calculated sum/count by category and sort descending, together with an inner join

I have a simple table. One column with a variable which I want to sum or count and another one with category. I tried this:
SELECT COUNT(*) AS counted, category
FROM mytable
GROUP BY category
ORDER BY counted DESC;
With out the ORDER BY counted DESC it works, however it is not sorted. I would like to see the maximum immediately, so sort descending. However, when running it, a message pops up and asks me to insert a value for counted. Why can't I do this in one step, why is this not working?
Same for sum:
SELECT sum(variable) AS calcsum, category
FROM mytable
GROUP BY category
ORDER BY calcsum DESC;
Furthermore I have the same problem or similiar when trying to do this in one step with a join. I have one table with provided IDs (variable called keys). Another table with IDs, a category, a filter variable and a score. I want the sum of score per category and sort it descending. So far I have:
SELECT SUM(score) AS calcsum, category
FROM (
SELECT keys, category, filter, score INTO newdataset
FROM table1 INNER JOIN table2 ON table1.keys=table2.ID
WHERE table2.filter="Value")
GROUP BY category;
And I thought here again to add: ORDER BY calcsum DESC
However, even without adding the ORDER BY I get the error message "An action query cannot be used as a row source". So what is my mistake here?
Just repeat the COUNT(*) expression:
SELECT COUNT(*) AS counted, category
FROM mytable
GROUP BY category
ORDER BY COUNT(*) DESC;
EDIT:
If you want this with INTO and JOINs:
SELECT SUM(score) AS calcsum, category
INTO newdataset
FROM table1 INNER JOIN
table2
ON table1.keys =table2.ID
WHERE table2.filter = "Value"
GROUP BY category
ORDER BY SUM(score) DESC;
simply you can use order by 2 desc 2 stands for the second column in your select statement

How to select all columns and count from a table?

I'm trying to select all columns in table top_teams_team as well as get a count of values for the hash_value column. The sql statement here is partially working in that it returns two columns, hash_value and total. I still want it to give me all the columns of the table as well.
select hash_value, count(hash_value) as total
from top_teams_team
group by hash_value
In the sql statement below, it gives me all the columns, but there are duplicates hash_value being displayed which isn't what I want. I tried putting distinct keyword in but it wasn't working correctly or maybe I'm not putting it in the right place.
select *
from top_teams_team
inner join (
select hash_value, count(hash_value) as total
from top_teams_team
group by hash_value
) q
on q.hash_value = top_teams_team.hash_value
A combination of a window function with DISTINCT ON might do what you are looking for:
SELECT DISTINCT ON (hash_value)
*, COUNT(*) OVER (PARTITION BY hash_value) AS total_rows
FROM top_teams_team
-- ORDER BY hash_value, ???
;
DISTINCT ON is applied after the window function, so Postgres first counts rows per distinct hash_value before picking the first row per group (incl. that count).
The query picks an arbitrary row from each group. If you want a specific one, add ORDER BY expressions accordingly.
This is not "a count of values for the hash_value column" but a count of rows per distinct hash_value. I guess that's what you meant.
Detailed explanation:
Best way to get result count before LIMIT was applied
Select first row in each GROUP BY group?
Depending on undisclosed information there may be (much) faster query styles ...
Optimize GROUP BY query to retrieve latest row per user
I am assuming that you are getting duplicate columns when you say: "but there are duplicates hash_value being displayed"
select q.hash_value, q.total, ttt.field1, ttt.field2, ttt.field3
from top_teams_team ttt
join (
select hash_value, count(hash_value) as total
from top_teams_team
group by hash_value
) q
on q.hash_value = top_teams_team.hash_value
Try using COUNT as an analytic function:
SELECT *, COUNT(*) OVER (PARTITION BY hash_value) total
FROM top_teams_team;

Querying a table for a minimum based on the largest of a second column

I have a subquery that has the following
PreviousRateCode, CurrentRateCode, PreviousReportDate, CurrentReportDate, TransactionDate.
The data in my subquery looks like
How can I query the results below to get the Smallest "RecentDerogMonths" based on the Largest "WorstDerogLevel" ?
There's an additional column that I omitted that has a customerID, so I need the Smallest "RecentDerogMonths" based on the Largest "WorstDerogLevel" for each customer, and this is a subquery of a larger select, so what I need is something like
Select Lowest Months by highest level
From (The result above)
Group by CustomerID
Is this what you want?
select top 1 t.*
from t
order by WorstDerogLevel desc, RecentDerogMonths asc;
If you want all matching rows that meet the condition then use select top (1) with ties.
EDIT:
Per customer, you use window functions. One method uses a subquery, but here is a cool trick:
select top (1) with ties t.*
from t
order by row_number() over (partition by customerid order by WorstDerogLevel desc, RecentDerogMonths asc);

Select last duplicate row with different id Oracle 11g

I have a table that look like this:
The problem is I need to get the last record with duplicates in the column "NRODENUNCIA".
You can use MAX(DENUNCIAID), along with GROUP BY... HAVING to find the duplicates and select the row with the largest DENUNCIAID:
SELECT MAX(DENUNCIAID), NRODENUNCIA, FECHAEMISION, ADUANA, MES, NOMBREESTADO
FROM YourTable
GROUP BY NRODENUNCIA, FECHAEMISION, ADUANA, MES, NOMBREESTADO
HAVING COUNT(1) > 1
This will only show rows that have at least one duplicate. If you want to see non-duplicate rows too, just remove the HAVING COUNT(1) > 1
There are a number of solutions for your problem. One is to use row_number.
Note that I've ordered by DENUNCIID in the OVER clause. This defines the "Last Record" as the one that has the largest DENUNCIID. If you want to define it differently you'd need to change the field that is being ordered.
with dupes as (
SELECT
ROW_NUMBER() OVER (Partition by NRODENUNCIA ORDER BY DENUNCIID DESC) RN,
*
FROM
YourTable
)
SELECT * FROM dupes where rn = 1
This only get's the last record per dupe.
If you want to only include records that have dupes then you change the where clause to
WHERE rn =1
and NRODENUNCIA in (select NRODENUNCIA from dupes where rn > 1)