ORDER BY the IN value list - sql

I have a simple SQL query in PostgreSQL 8.3 that grabs a bunch of comments. I provide a sorted list of values to the IN construct in the WHERE clause:
SELECT * FROM comments WHERE (comments.id IN (1,3,2,4));
This returns comments in an arbitrary order which in my happens to be ids like 1,2,3,4.
I want the resulting rows sorted like the list in the IN construct: (1,3,2,4).
How to achieve that?

You can do it quite easily with (introduced in PostgreSQL 8.2) VALUES (), ().
Syntax will be like this:
select c.*
from comments c
join (
values
(1,1),
(3,2),
(2,3),
(4,4)
) as x (id, ordering) on c.id = x.id
order by x.ordering

In Postgres 9.4 or later, this is simplest and fastest:
SELECT c.*
FROM comments c
JOIN unnest('{1,3,2,4}'::int[]) WITH ORDINALITY t(id, ord) USING (id)
ORDER BY t.ord;
WITH ORDINALITY was introduced with in Postgres 9.4.
No need for a subquery, we can use the set-returning function like a table directly. (A.k.a. "table-function".)
A string literal to hand in the array instead of an ARRAY constructor may be easier to implement with some clients.
For convenience (optionally), copy the column name we are joining to ("id" in the example), so we can join with a short USING clause to only get a single instance of the join column in the result.
Works with any input type. If your key column is of type text, provide something like '{foo,bar,baz}'::text[].
Detailed explanation:
PostgreSQL unnest() with element number

Just because it is so difficult to find and it has to be spread: in mySQL this can be done much simpler, but I don't know if it works in other SQL.
SELECT * FROM `comments`
WHERE `comments`.`id` IN ('12','5','3','17')
ORDER BY FIELD(`comments`.`id`,'12','5','3','17')

With Postgres 9.4 this can be done a bit shorter:
select c.*
from comments c
join (
select *
from unnest(array[43,47,42]) with ordinality
) as x (id, ordering) on c.id = x.id
order by x.ordering;
Or a bit more compact without a derived table:
select c.*
from comments c
join unnest(array[43,47,42]) with ordinality as x (id, ordering)
on c.id = x.id
order by x.ordering
Removing the need to manually assign/maintain a position to each value.
With Postgres 9.6 this can be done using array_position():
with x (id_list) as (
values (array[42,48,43])
)
select c.*
from comments c, x
where id = any (x.id_list)
order by array_position(x.id_list, c.id);
The CTE is used so that the list of values only needs to be specified once. If that is not important this can also be written as:
select c.*
from comments c
where id in (42,48,43)
order by array_position(array[42,48,43], c.id);

I think this way is better :
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY id=1 DESC, id=3 DESC, id=2 DESC, id=4 DESC

Another way to do it in Postgres would be to use the idx function.
SELECT *
FROM comments
ORDER BY idx(array[1,3,2,4], comments.id)
Don't forget to create the idx function first, as described here: http://wiki.postgresql.org/wiki/Array_Index

In Postgresql:
select *
from comments
where id in (1,3,2,4)
order by position(id::text in '1,3,2,4')

On researching this some more I found this solution:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY CASE "comments"."id"
WHEN 1 THEN 1
WHEN 3 THEN 2
WHEN 2 THEN 3
WHEN 4 THEN 4
END
However this seems rather verbose and might have performance issues with large datasets.
Can anyone comment on these issues?

To do this, I think you should probably have an additional "ORDER" table which defines the mapping of IDs to order (effectively doing what your response to your own question said), which you can then use as an additional column on your select which you can then sort on.
In that way, you explicitly describe the ordering you desire in the database, where it should be.

sans SEQUENCE, works only on 8.4:
select * from comments c
join
(
select id, row_number() over() as id_sorter
from (select unnest(ARRAY[1,3,2,4]) as id) as y
) x on x.id = c.id
order by x.id_sorter

SELECT * FROM "comments" JOIN (
SELECT 1 as "id",1 as "order" UNION ALL
SELECT 3,2 UNION ALL SELECT 2,3 UNION ALL SELECT 4,4
) j ON "comments"."id" = j."id" ORDER BY j.ORDER
or if you prefer evil over good:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY POSITION(','+"comments"."id"+',' IN ',1,3,2,4,')

And here's another solution that works and uses a constant table (http://www.postgresql.org/docs/8.3/interactive/sql-values.html):
SELECT * FROM comments AS c,
(VALUES (1,1),(3,2),(2,3),(4,4) ) AS t (ord_id,ord)
WHERE (c.id IN (1,3,2,4)) AND (c.id = t.ord_id)
ORDER BY ord
But again I'm not sure that this is performant.
I've got a bunch of answers now. Can I get some voting and comments so I know which is the winner!
Thanks All :-)

create sequence serial start 1;
select * from comments c
join (select unnest(ARRAY[1,3,2,4]) as id, nextval('serial') as id_sorter) x
on x.id = c.id
order by x.id_sorter;
drop sequence serial;
[EDIT]
unnest is not yet built-in in 8.3, but you can create one yourself(the beauty of any*):
create function unnest(anyarray) returns setof anyelement
language sql as
$$
select $1[i] from generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
that function can work in any type:
select unnest(array['John','Paul','George','Ringo']) as beatle
select unnest(array[1,3,2,4]) as id

Slight improvement over the version that uses a sequence I think:
CREATE OR REPLACE FUNCTION in_sort(anyarray, out id anyelement, out ordinal int)
LANGUAGE SQL AS
$$
SELECT $1[i], i FROM generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
SELECT
*
FROM
comments c
INNER JOIN (SELECT * FROM in_sort(ARRAY[1,3,2,4])) AS in_sort
USING (id)
ORDER BY in_sort.ordinal;

select * from comments where comments.id in
(select unnest(ids) from bbs where id=19795)
order by array_position((select ids from bbs where id=19795),comments.id)
here, [bbs] is the main table that has a field called ids,
and, ids is the array that store the comments.id .
passed in postgresql 9.6

Lets get a visual impression about what was already said. For example you have a table with some tasks:
SELECT a.id,a.status,a.description FROM minicloud_tasks as a ORDER BY random();
id | status | description
----+------------+------------------
4 | processing | work on postgres
6 | deleted | need some rest
3 | pending | garden party
5 | completed | work on html
And you want to order the list of tasks by its status.
The status is a list of string values:
(processing, pending, completed, deleted)
The trick is to give each status value an interger and order the list numerical:
SELECT a.id,a.status,a.description FROM minicloud_tasks AS a
JOIN (
VALUES ('processing', 1), ('pending', 2), ('completed', 3), ('deleted', 4)
) AS b (status, id) ON (a.status = b.status)
ORDER BY b.id ASC;
Which leads to:
id | status | description
----+------------+------------------
4 | processing | work on postgres
3 | pending | garden party
5 | completed | work on html
6 | deleted | need some rest
Credit #user80168

I agree with all other posters that say "don't do that" or "SQL isn't good at that". If you want to sort by some facet of comments then add another integer column to one of your tables to hold your sort criteria and sort by that value. eg "ORDER BY comments.sort DESC " If you want to sort these in a different order every time then... SQL won't be for you in this case.

Related

Intersection of Records in Postgres

Suppose I have labels with multiple stores associated with them like so:
label_id | store_id
--------------------
label_1 | store_1
label_1 | store_2
label_1 | store_3
label_2 | store_2
label_2 | store_3
label_3 | store_1
label_3 | store_2
Is there any good way in SQL (or jooq) to get all the store ids in the intersection of the labels? Meaning just return store_2 in the example above because store_2 is associated with label_1, label_2, and label_3? I would like a general method to handle the case where I have n labels.
This is a relational division problem, where you want the stores that have all possible labels. Here is an approach using aggregation:
select store_id
from mytable
group by store_id
having count(*) = (select count(distinct label_id) from mytable)
Note that this assumes no duplicate (store_id, label_id) tuples. Otherwise, you need to change the having clause to:
having count(distinct label_id) = (select count(distinct label_id) from mytable)
Since you're also looking for a jOOQ solution, jOOQ supports a synthetic relational division operator, which produces a more academic approach to relational division, using relational algebra operators only:
// Using jOOQ
T t1 = T.as("t1");
T t2 = T.as("t2");
ctx.select()
.from(t1.divideBy(t2).on(t1.LABEL_ID.eq(t2.LABEL_ID)).returning(t1.STORE_ID).as("t"))
.fetch();
This produces something like the following query:
select t.store_id
from (
select distinct dividend.store_id
from t dividend
where not exists (
select 1
from t t2
where not exists (
select 1
from t t1
where dividend.store_id = t1.store_id
and t1.label_id = t2.label_id
)
)
) t
In plain English:
Get me all the stores (dividend), for which there exists no label (t2) for which that store (dividend) has no entry (t1)
Or in other words
If there was a label (t2) that a store (dividend) does not have (t1), then that store (dividend) would not have all the available labels.
This isn't necessarily more readable or faster than GROUP BY / HAVING COUNT(*) based implementations of relational divisions (as seen in other answers), in fact, the GROUP BY / HAVING based solutions are probably preferrable here, especially since only one table is involved. A future version of jOOQ might use the GROUP BY / HAVING approach, instead: #10450
But in jOOQ, it might be quite convenient to write this way, and you asked for a jOOQ solution :)
Then convert the query by #GMB into an SQL function that takes an array and returns a table of store_id's.
create or replace
function stores_with_all_labels( label_list text[] )
returns table (store_id text)
language sql
as $$
select store_id
from label_store
where label_id = any (label_list)
group by store_id
having count(*) = array_length(label_list,1);
$$;
Then all that's needed is a simple select. See complete example here.
If there are three particular labels you want, you can use:
select store_id
from t
where label in (1, 2, 3)
group by store_id
having count(*) = 3;
If you want only those three labels and nothing else, then:
select store_id
from t
group by store_id
having count(*) = 3 and
count(*) filter (where label in (1, 2, 3)) = count(*);

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here
You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).
Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).
A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

Order by data as per supplied Id in sql

Query:
SELECT *
FROM [MemberBackup].[dbo].[OriginalBackup]
where ration_card_id in
(
1247881,174772,
808454,2326154
)
Right now the data is ordered by the auto id or whatever clause I'm passing in order by.
But I want the data to come in sequential format as per id's I have passed
Expected Output:
All Data for 1247881
All Data for 174772
All Data for 808454
All Data for 2326154
Note:
Number of Id's to be passed will 300 000
One option would be to create a CTE containing the ration_card_id values and the orders which you are imposing, and the join to this table:
WITH cte AS (
SELECT 1247881 AS ration_card_id, 1 AS position
UNION ALL
SELECT 174772, 2
UNION ALL
SELECT 808454, 3
UNION ALL
SELECT 2326154, 4
)
SELECT t1.*
FROM [MemberBackup].[dbo].[OriginalBackup] t1
INNER JOIN cte t2
ON t1.ration_card_id = t2.ration_card_id
ORDER BY t2.position DESC
Edit:
If you have many IDs, then neither the answer above nor the answer given using a CASE expression will suffice. In this case, your best bet would be to load the list of IDs into a table, containing an auto increment ID column. Then, each number would be labelled with a position as its record is being loaded into your database. After this, you can join as I have done above.
If the desired order does not reflect a sequential ordering of some preexisting data, you will have to specify the ordering yourself. One way to do this is with a case statement:
SELECT *
FROM [MemberBackup].[dbo].[OriginalBackup]
where ration_card_id in
(
1247881,174772,
808454,2326154
)
ORDER BY CASE ration_card_id
WHEN 1247881 THEN 0
WHEN 174772 THEN 1
WHEN 808454 THEN 2
WHEN 2326154 THEN 3
END
Stating the obvious but note that this ordering most likely is not represented by any indexes, and will therefore not be indexed.
Insert your ration_card_id's in #temp table with one identity column.
Re-write your sql query as:
SELECT a.*
FROM [MemberBackup].[dbo].[OriginalBackup] a
JOIN #temps b
on a.ration_card_id = b.ration_card_id
order by b.id

Relational algebra without aggregate function

We have a table Phone whose values are
(Name, Number)
--------------
(John, 123)
(John, 456)
(Bravo, 789)
(Ken, 741)
(Ken, 589)
If the question is to Find the guy who uses only one number, the answer is Bravo.
I solved this using aggregate function. But I don't know how to solve without using aggregate function.
Here is my solution:
SELECT *
FROM test t
WHERE NOT EXISTS (
SELECT 1
FROM test
WHERE NAME = t.NAME
AND number <> t.number);
And a sample SQLFiddle.
I'm not sure about this representation in relational algebra (and it's most likely not correct or complete but it might give you a starting point):
RESULT = {(name, number) ∈ TEST | (name, number_2) ¬∃ TEST, number <> number_2}
(this is the main idea, you could probably try and have a look here to try and rewrite this correctly, since I haven't written anything in relational algebra for more than 10 years).
Or maybe you're looking for a different type of representation, like this one here?
You can use LEFT JOIN and use the same table in your JOIN , something like this..
SELECT a.NAME, a.NUMBER FROM test a
LEFT JOIN test b ON a.name = b.name AND a.number <> b.number
WHERE b.name IS NULL;
Hope this helps. :)
Try the following if you are using Access-SQL -
SELECT Name
FROM Phone
GROUP BY Name
HAVING COUNT(Name) = 1;
Otherwise, try -
SELECT Name
FROM Phone
WHERE COUNT(Name) = 1;
If you have any questions, then please feel free to reply.
You can take advantage of RANK() functions something like below.
SELECT * FROM #Tbl WHERE Name NOT IN(
SELECT Name FROM (
SELECT Name, RANK() OVER(PARTITION BY Name ORDER BY Id) AS Rank
FROM #Tbl) t
WHERE t.Rank > 1)
The only draw back of this method is, you need to use unique id from your table to get the right result.
SQLFiddle
Another way of doing it is:
select Name from Phone p
where (select name from Phone p2 where p.name = p2.name and p2.number <> p.number limit 1) is null
Edit: added limit 1 to make sure sub select returns a scalar

How can I select adjacent rows to an arbitrary row (in sql or postgresql)?

I want to select some rows based on certain criteria, and then take one entry from that set and the 5 rows before it and after it.
Now, I can do this numerically if there is a primary key on the table, (e.g. primary keys that are numerically 5 less than the target row's key and 5 more than the target row's key).
So select the row with the primary key of 7 and the nearby rows:
select primary_key from table where primary_key > (7-5) order by primary_key limit 11;
2
3
4
5
6
-=7=-
8
9
10
11
12
But if I select only certain rows to begin with, I lose that numeric method of using primary keys (and that was assuming the keys didn't have any gaps in their order anyway), and need another way to get the closest rows before and after a certain targeted row.
The primary key output of such a select might look more random and thus less succeptable to mathematical locating (since some results would be filtered, out, e.g. with a where active=1):
select primary_key from table where primary_key > (34-5)
order by primary_key where active=1 limit 11;
30
-=34=-
80
83
100
113
125
126
127
128
129
Note how due to the gaps in the primary keys caused by the example where condition (for example becaseu there are many inactive items), I'm no longer getting the closest 5 above and 5 below, instead I'm getting the closest 1 below and the closest 9 above, instead.
There's a lot of ways to do it if you run two queries with a programming language, but here's one way to do it in one SQL query:
(SELECT * FROM table WHERE id >= 34 AND active = 1 ORDER BY id ASC LIMIT 6)
UNION
(SELECT * FROM table WHERE id < 34 AND active = 1 ORDER BY id DESC LIMIT 5)
ORDER BY id ASC
This would return the 5 rows above, the target row, and 5 rows below.
Here's another way to do it with analytic functions lead and lag. It would be nice if we could use analytic functions in the WHERE clause. So instead you need to use subqueries or CTE's. Here's an example that will work with the pagila sample database.
WITH base AS (
SELECT lag(customer_id, 5) OVER (ORDER BY customer_id) lag,
lead(customer_id, 5) OVER (ORDER BY customer_id) lead,
c.*
FROM customer c
WHERE c.active = 1
AND c.last_name LIKE 'B%'
)
SELECT base.* FROM base
JOIN (
-- Select the center row, coalesce so it still works if there aren't
-- 5 rows in front or behind
SELECT COALESCE(lag, 0) AS lag, COALESCE(lead, 99999) AS lead
FROM base WHERE customer_id = 280
) sub ON base.customer_id BETWEEN sub.lag AND sub.lead
The problem with sgriffinusa's solution is that you don't know which row_number your center row will end up being. He assumed it will be row 30.
For similar query I use analytic functions without CTE. Something like:
select ...,
LEAD(gm.id) OVER (ORDER BY Cit DESC) as leadId,
LEAD(gm.id, 2) OVER (ORDER BY Cit DESC) as leadId2,
LAG(gm.id) OVER (ORDER BY Cit DESC) as lagId,
LAG(gm.id, 2) OVER (ORDER BY Cit DESC) as lagId2
...
where id = 25912
or leadId = 25912 or leadId2 = 25912
or lagId = 25912 or lagId2 = 25912
such query works more faster for me than CTE with join (answer from Scott Bailey). But of course less elegant
You could do this utilizing row_number() (available as of 8.4). This may not be the correct syntax (not familiar with postgresql), but hopefully the idea will be illustrated:
SELECT *
FROM (SELECT ROW_NUMBER() OVER (ORDER BY primary_key) AS r, *
FROM table
WHERE active=1) t
WHERE 25 < r and r < 35
This will generate a first column having sequential numbers. You can use this to identify the single row and the rows above and below it.
If you wanted to do it in a 'relationally pure' way, you could write a query that sorted and numbered the rows. Like:
select (
select count(*) from employees b
where b.name < a.name
) as idx, name
from employees a
order by name
Then use that as a common table expression. Write a select which filters it down to the rows you're interested in, then join it back onto itself using a criterion that the index of the right-hand copy of the table is no more than k larger or smaller than the index of the row on the left. Project over just the rows on the right. Like:
with numbered_emps as (
select (
select count(*)
from employees b
where b.name < a.name
) as idx, name
from employees a
order by name
)
select b.*
from numbered_emps a, numbered_emps b
where a.name like '% Smith' -- this is your main selection criterion
and ((b.idx - a.idx) between -5 and 5) -- this is your adjacency fuzzy-join criterion
What could be simpler!
I'd imagine the row-number based solutions will be faster, though.