PostgreSQL DISTINCT ON with different ORDER BY - sql

I want to run this query:
SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
FROM purchases
WHERE purchases.product_id = 1
ORDER BY purchases.purchased_at DESC
But I get this error:
PG::Error: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
Adding address_id as first ORDER BY expression silences the error, but I really don't want to add sorting over address_id. Is it possible to do without ordering by address_id?

Documentation says:
DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. [...] Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. [...] The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).
Official documentation
So you'll have to add the address_id to the order by.
Alternatively, if you're looking for the full row that contains the most recent purchased product for each address_id and that result sorted by purchased_at then you're trying to solve a greatest N per group problem which can be solved by the following approaches:
The general solution that should work in most DBMSs:
SELECT t1.* FROM purchases t1
JOIN (
SELECT address_id, max(purchased_at) max_purchased_at
FROM purchases
WHERE product_id = 1
GROUP BY address_id
) t2
ON t1.address_id = t2.address_id AND t1.purchased_at = t2.max_purchased_at
ORDER BY t1.purchased_at DESC
A more PostgreSQL-oriented solution based on #hkf's answer:
SELECT * FROM (
SELECT DISTINCT ON (address_id) *
FROM purchases
WHERE product_id = 1
ORDER BY address_id, purchased_at DESC
) t
ORDER BY purchased_at DESC
Problem clarified, extended and solved here: Selecting rows ordered by some column and distinct on another

A subquery can solve it:
SELECT *
FROM (
SELECT DISTINCT ON (address_id) *
FROM purchases
WHERE product_id = 1
) p
ORDER BY purchased_at DESC;
Leading expressions in ORDER BY have to agree with columns in DISTINCT ON, so you can't order by different columns in the same SELECT.
Only use an additional ORDER BY in the subquery if you want to pick a particular row from each set:
SELECT *
FROM (
SELECT DISTINCT ON (address_id) *
FROM purchases
WHERE product_id = 1
ORDER BY address_id, purchased_at DESC -- get "latest" row per address_id
) p
ORDER BY purchased_at DESC;
If purchased_at can be NULL, use DESC NULLS LAST - and match your index for best performance. See:
Sort by column ASC, but NULL values first?
Why does ORDER BY NULLS LAST affect the query plan on a primary key?
Related, with more explanation:
Select first row in each GROUP BY group?
Sort by column ASC, but NULL values first?

You can order by address_id in an subquery, then order by what you want in an outer query.
SELECT * FROM
(SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
FROM "purchases"
WHERE "purchases"."product_id" = 1 ORDER BY address_id DESC )
ORDER BY purchased_at DESC

Window function may solve that in one pass:
SELECT DISTINCT ON (address_id)
LAST_VALUE(purchases.address_id) OVER wnd AS address_id
FROM "purchases"
WHERE "purchases"."product_id" = 1
WINDOW wnd AS (
PARTITION BY address_id ORDER BY purchases.purchased_at DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

For anyone using Flask-SQLAlchemy, this worked for me
from app import db
from app.models import Purchases
from sqlalchemy.orm import aliased
from sqlalchemy import desc
stmt = Purchases.query.distinct(Purchases.address_id).subquery('purchases')
alias = aliased(Purchases, stmt)
distinct = db.session.query(alias)
distinct.order_by(desc(alias.purchased_at))

It can also be solved using the following query along with other answers.
WITH purchase_data AS (
SELECT address_id, purchased_at, product_id,
row_number() OVER (PARTITION BY address_id ORDER BY purchased_at DESC) AS row_number
FROM purchases
WHERE product_id = 1)
SELECT address_id, purchased_at, product_id
FROM purchase_data where row_number = 1

You can also done this by using group by clause
SELECT purchases.address_id, purchases.* FROM "purchases"
WHERE "purchases"."product_id" = 1 GROUP BY address_id,
purchases.purchased_at ORDER purchases.purchased_at DESC

Related

Distinct rows in a table in sql

I have a table with multiple rows of the same member id. I need only distinct rows based on 2 unique columns
Ex: there are 100 different customers, the table has 1000 rows because every customer has multiple cities and segments assigned to him.
I need 100 distinct rows for these customers depending on a unique segment and city combination. There is no specific requirement for this combination, just the first from the table is fine.
So, currently the table is somewhat like this,
Hope this helps.
use row_number()
select * from (select *,row_number() over(partition by memberid order by sales) rn
from table_name
) a where a.rn=1
Handy sql-server top(1) with ties syntax for that
select top(1) with ties t.*
from table_name t
order by row_number() over(partition by memberid order by sales)
As you have no paticular requirement for which exactly row to select, any column will do at order by, it can be null as well
select top(1) with ties t.*
from table_name t
order by row_number() over(partition by memberid order by (select null))
The simplest way to do this is to use the ROW_NUMBER() OVER(GROUP BY...) syntax. You have no need to use an order by, since you want an arbitrary row, but only one, for each member.
Since you need only the expected data, and not the Row_Number value, make sure that you detail the fields returned, like below:
SELECT
MemberId,
city,
segment,
sales
FROM (
SELECT *
ROW_NUMBER() OVER (GROUP BY MemberId) as Seq
FROM [Status]
) src
WHERE Seq = 1

Need to change LIMIT into something else

Is there a way to change "LIMIT 1" and get the same output? I have to get client's name, surname and a quantity of books that has the most books
SELECT stud.skaitytojas.name, stud.skaitytojas.surname,
COUNT (stud.skaitytojas.nr) AS quantity
FROM stud.egzempliorius , stud.skaitytojas
WHERE stud.egzempliorius.client = stud.skaitytojas.nr
GROUP BY stud.skaitytojas.nr
ORDER BY quantity DESC
LIMIT 1
Postgres supports the ANSI standard FETCH FIRST 1 ROW ONLY, so you can do:
SELECT s.name, s.surname, COUNT(s.nr) AS quantity
FROM stud.egzempliorius e JOIN
stud.skaitytojas s
ON e.client = s.nr
GROUP BY s.name, s.surname
ORDER BY quantity DESC
FETCH FIRST 1 ROW ONLY;
Also notice the use of table aliases and proper JOIN syntax. I also prefer to list the columns in the SELECT in the GROUP BY, although that is optional if s.nr is unique.
You can select the row with the highest quantity using row_number()
SELECT * FROM (
SELECT * , row_number() over (order by quantity desc) rn FROM (
SELECT stud.skaitytojas.name, stud.skaitytojas.surname, COUNT (stud.skaitytojas.nr) AS quantity
FROM stud.egzempliorius , stud.skaitytojas
WHERE stud.egzempliorius.client = stud.skaitytojas.nr
GROUP BY stud.skaitytojas.name, stud.skaitytojas.surname
) t
) t where rn = 1
If you want to include ties for the highest quantity, then use rank() instead.

How to I do multiple columns partitioning with the rows being duplicated?

I have a set of SQL Stored procedure to use partitioning for my ranking to get percentile. by doing the below partitioning I am able to get my percentiles data right. However my problem is there are duplicates in each row. E.g for each DESC there are multiple duplicates when it is suppose to be only 1 row. Why is this so?
row_nums AS
(
SELECT DATE, DESC, NUM, ROW_NUMBER() OVER (PARTITION BY DATE, DESC ORDER BY NUM ASC) AS Row_Num
FROM ******
)
SELECT .................
This is the output I get currently: (Where there are duplicate rows being returned - Refer to Row 6 to 8)
http://i.stack.imgur.com/foe7g.png[^]
This is the output I want to achieve: http://i.stack.imgur.com/GkrHP.png[^]
You can remove duplicate by adding one more INNER query in FROM clause like below:
;WTIH row_nums AS
(
SELECT DATE, DESC, NUM, ROW_NUMBER() OVER (PARTITION BY DATE, DESC ORDER BY NUM ASC) AS Row_Num
FROM (
SELECT your required columns, COUNT(duplicated_rows_columnsname)
FROM ***
GROUP BY columnnames
HAVING COUNT(duplicated_rows_columnsname) = 1
)
)
SELECT .................
However, You can also remove duplicate row using DISTINCT clause in INNER. query.

How to reorder a result when using TOP clause?

I used this code to select last 5 rows from table:
SELECT TOP(5) * FROM tbl_reg ORDER BY Id DESC
But I want to reorder this result ( the last five rows returned ) By Id ASC
How can I do this?
Use a subquery:
select t.*
from (SELECT TOP(5) * FROM tbl_reg ORDER BY Id DESC) t
order by id ASC;

How to reverse the table that comes from SQL query which already includes ORDER BY

Here is my query:
SELECT TOP 8 id, rssi1, date
FROM history
WHERE (siteName = 'CCL03412')
ORDER BY id DESC
This the result:
How can I reverse this table based on date (Column2) by using SQL?
You can use the first query to get the matching ids, and use them as part of an IN clause:
SELECT id, rssi1, date
FROM history
WHERE id IN
(
SELECT TOP 8 id
FROM history
WHERE (siteName = 'CCL03412')
ORDER BY id DESC
)
ORDER BY date ASC
You could simply use a sub-query. If you apply a TOP clause the nested ORDER BY is allowed:
SELECT X.* FROM(
SELECT TOP 8 id, Column1, Column2
FROM dbo.History
WHERE (siteName = 'CCL03412')
ORDER BY id DESC) X
ORDER BY Column2
Demo
The SELECT query of a subquery is always enclosed in parentheses. It
cannot include a COMPUTE or FOR BROWSE clause, and may only include an
ORDER BY clause when a TOP clause is also specified.
Subquery Fundamentals
try the below :
select * from (SELECT TOP 8 id, rssi1, date
FROM history
WHERE (siteName = 'CCL03412')
ORDER BY id DESC ) aa order by aa.date DESC
didn't run it, but i think it should go well
WITH cte AS
(
SELECT id, rssi1, date, RANK() OVER (ORDER BY ID DESC) AS Rank
FROM history
WHERE (siteName = 'CCL03412')
)
SELECT id, rssi1, date
FROM cte
WHERE Rank <= 8
ORDER BY Date DESC
I have not run this but i think it will work. Execute and let me know if you face error
select id, rssi1, date from (SELECT TOP 8 id, rssi1, date
FROM history
WHERE (siteName = 'CCL03412')
ORDER BY id DESC) order by date ;