Select Distinct on one column, without ordering by that column - sql

I'm trying to select only the IDs of a table that I'm querying on, and still be able to specify ordering on other columns.
First I tried simply doing:
SELECT DISTINCT countries.id
FROM countries
...
ORDER BY province_infos.population DESC, country_infos.population ASC
That won't work, because for SELECT DISTINCT, ORDER BY expressions must appear in select list, and returns an error.
If I add province_infos.population and country_infos.population, it works, but I then get duplicate IDs, which I cannot have.
To resolve this, i attempted using DISTINCT ON():
SELECT DISTINCT ON (countries.id)
countries.id, country_infos.population, province_infos.population
FROM countries
...
ORDER BY province_infos.population DESC, country_infos.population ASC
That then gives me the error SELECT DISTINCT ON expressions must match initial ORDER BY expressions. I can't SELECT DISTINCT ON a column without ordering it too.
It seems the only way for this to work, is to do something like:
SELECT DISTINCT ON (countries.id)
countries.id
FROM countries
...
ORDER BY countries.id DESC, province_infos.population DESC, country_infos.population ASC
I unfortunately can't do this, since I cannot order by IDs, as it skews the results of the other orders. And it seems the only way to not order by the IDs, is if I remove the DISTINCT from the select, but then I'll get duplicates.
Anyone know how I can work around this?
EDIT:
The ... I omitted shouldn't be relevant, but in case you want to see:
JOIN country_infos ON country_infos.country_refer = countries.id
JOIN languages ON languages.country_refer = countries.id
JOIN provinces ON provinces.country_refer = countries.id
JOIN province_infos ON province_infos.province_refer = provinces.id
WHERE country_infos.population > 10.3
AND languages.alphabet = 'Latin'
And I'm not just trying to get this working for this specific query. This is just an example I'm using to explain the predicament. I'm generating these kinds of queries automatically off of an arbitrary data structure.

The general answer to your question is that when using DISTINCT ON (x, ...) in SELECT statement in postgresql, the database sorts by the values in the distinct clause in order to make it easy to tell if the rows have distinct values (once they're ordered by the values, it only takes one pass for the db to remove duplicates, and it only needs to compare adjacent rows. Because of this, the db forces you to sort by the same columns in the distinct clause.
You can work around this by making your original query a subquery, like so:
SELECT t.id FROM
(SELECT DISTINCT ON (countries.id) countries.id
, province_infos.population
, country_infos.founding_date
FROM countries
...
ORDER BY countries.id, province_infos.population DESC, country_infos.founding_date ASC
)t
ORDER BY t.population DESC, T.founding_date ASC

Use GROUP BY, something like this:
SELECT c.id
FROM countries c
...
GROUP BY c.id
ORDER BY MAX(pi.population) DESC, MAX(ci.population) ASC;
Actually, given the nature of your problem, you might want SUM():
SELECT c.id
FROM countries c
...
GROUP BY c.id
ORDER BY SUM(pi.population) DESC, SUM(ci.population) ASC;

Related

SAS SQL SELECT DISTINCT WITH GROUP BY

What if a SQL code as below?
Proc SQL;
SELECT DISTINCT ID,SUM(AMOUNT) AS M,SUM(NO) AS CNT
FROM CUSTOMER_LIST
GROUP BY ID
ORDER BY CNT DESC;
QUIT;
Use DISTINCT with GROUP BY. Any possible error will occur when using this combination Or DISTINCT just a redundant word?
Thanks~
Use DISTINCT with GROUP BY. Any possible error will occur when using this combination? Or DISTINCT just a redundant word?
This won't error, but that's just unnecessary redondancy. GROUP BY ID guarantees that each ID will appear only on one row in the resulset. There is no benefit for adding DISTINCT here - and it makes the intent of the query harder to understand.
On the other hand, there are situations where you would use DISTINCT without GROUP BY: typically when you want to deduplicate a set of columns, but do not need to use aggregate functions (SUM(), COUNT()...).
SELECT ID,SUM(AMOUNT) AS M,SUM(NO) AS CNT
FROM CUSTOMER_LIST
GROUP BY ID
ORDER BY CNT DESC;
We already group by id so no need distinct id

Order by DESC reverse result

I'm retrieving some data in SQL, order by DESC. I then want to reverse the result. I was doing this by pushing the data into an array and then using array_reverse, but I am finding it's quite taxing on CPU time and would like to simply use the correct SQL query.
I've looked at this thread SQL Server reverse order after using desc, but I cannot seem to make it work with my query.
SELECT live.message,
live.sender,
live.sdate,
users.online
FROM live, users
WHERE users.username = live.sender
ORDER BY live.id DESC
LIMIT 15
You can place your query into a subquery and then reverse the order:
SELECT t.message,
t.sender,
t.sdate,
t.online
FROM
(
SELECT live.id,
live.message,
live.sender,
live.sdate,
users.online
FROM live
INNER JOIN users
ON users.username = live.sender
ORDER BY live.id DESC
LIMIT 15
) t
ORDER BY t.id ASC
You'll notice that I replaced your implicit JOIN with an explicit INNER JOIN. It is generally considered undesirable to use commas in the FROM clause (q.v. the ANSI-92 standard) because it makes the query harder to read.
You could wrap your query with another query and order by with asc. Since you want to order by live.id, you must include it in the inner query so the outer one can sort by it:
SELECT message, sender, sdate, online
FROM (SELECT live.message, live.sender, live.sdate, users.online, live.id
FROM live, users
WHERE users.username = live.sender
ORDER BY live.id DESC
LIMIT 15) t
ORDER BY id ASC

SQL Order By using concat

I'm concatenating two fields and I only want to order by the second field (p.organizationname). Is that possible?
I'm displaying this field so I need a solution that doesn't include me having to select the fields separately.
Here is what i have so far:
SELECT distinct Concat(Concat(f.REFERENCEFILE, ','),p.ORGANIZATIONNAME)
FROM PEOPLE p,FOLDER f,FOLDERPEOPLE fp,folderinfo fi...
Order By concat(Concat(f.REFERENCEFILE, ','),p.ORGANIZATIONNAME)
Use GROUP BY and ORDER BY an aggregate instead of DISTINCT:
SELECT Concat(Concat(f.REFERENCEFILE, ','),p.ORGANIZATIONNAME)
FROM PEOPLE p,FOLDER f,FOLDERPEOPLE fp,folderinfo fi...
GROUP BY Concat(Concat(f.REFERENCEFILE, ','),p.ORGANIZATIONNAME)
Order By MAX(p.ORGANIZATIONNAME)
The problem can be illustrated with an example:
ID Col1
1 Dog
1 Cat
2 Horse
Distinct ID? Easy: 1,2
Distinct ID Order by Col1... wait.. which value of Col1 should SQL use? SQL is confused and angry.
Since you are using a concatenation of two fields and want to sort by one of those fields, you could also include the sort field in a DISTINCT subquery and then ORDER BY the sort field without including it in your SELECT list.
Since you have a DISTINCT your ORDER BY clause should be specified in the SELECT, you can use a subquery to achieve the same result in your case since the Distinct values will be the same when you add P.ORGANIZATIONNAME
SELECT col
FROM( SELECT distinct Concat(Concat(f.REFERENCEFILE, ','),p.ORGANIZATIONNAME) a,
p.ORGANIZATIONNAME b
FROM PEOPLE p,FOLDER f,FOLDERPEOPLE fp,folderinfo fi... ) t
order by b

How can I resolve the distinct issue in SQL Server 2005?

I am trying to get distinct values for my query. I tried like below, but I am not getting proper result, will any one suggest me how to do resolve the issue.
Here the I want to distinct part_id.
http://tinypic.com/view.php?pic=9scx21&s=8#.UupFqT2SzyQ
Thanks in advance.
Why do you think the result is not correct, the rows returned are distinct.
DISTINCT is applied to all the columns, there's nothing like give me a DISTINCT(p.part_id) and don't care about other columns.
What you probably want is a single row for each part.id
If you don't have any rules which row you want to be returned you can go with a ROW_NUMBER:
select *
from
(
select all your columns
, row_number() over (partition by p.partid order by p.part_id) as rn
from ....
where ...
) as dt
where rn = 1
If there are some rules to determine which row should be returned (oldest/newest/whatever) you simply ORDER BY this column DESC instead of ORDER BY p.part
order by part_id;
Change SELECT DISTINCT P.PART_ID FROM.. at begining and add GROUP BY p.part_id at end.
Distinct must be applied for all columns which values are the same so you can add columns but remenber to add thet to GROUP BY also

Is order in a subquery guaranteed to be preserved?

I am wondering in particular about PostgreSQL. Given the following contrived example:
SELECT name FROM
(SELECT name FROM people WHERE age >= 18 ORDER BY age DESC) p
LIMIT 10
Are the names returned from the outer query guaranteed to be be in the order they were for the inner query?
No, put the order by in the outer query:
SELECT name FROM
(SELECT name, age FROM people WHERE age >= 18) p
ORDER BY p.age DESC
LIMIT 10
The inner (sub) query returns a result-set. If you put the order by there, then the intermediate result-set passed from the inner (sub) query, to the outer query, is guaranteed to be ordered the way you designate, but without an order by in the outer query, the result-set generated by processing that inner query result-set, is not guaranteed to be sorted in any way.
For simple cases, #Charles query is most efficient.
More generally, you can use the window function row_number() to carry any order you like to the main query, including:
order by columns not in the SELECT list of the subquery and thus not reproducible
arbitrary ordering of peers according to ORDER BY criteria. Postgres will reuse the same arbitrary order in the window function within the subquery. (But not truly random order from random() for instance!)
If you don't want to preserve arbitrary sort order of peers from the subquery, use rank() instead.
This may also be generally superior with complex queries or multiple query layers:
SELECT p.name
FROM (
SELECT name, row_number() OVER (ORDER BY <same order by criteria>) AS rn
FROM people
WHERE age >= 18
ORDER BY <any order by criteria>
) p
ORDER BY p.rn
LIMIT 10;
The are not guaranteed to be in the same order, though when you run it you might see that it is generally follows the order.
You should place the order by on the main query
SELECT name FROM
(SELECT name FROM people WHERE age >= 18) p
ORDER BY p.age DESC LIMIT 10