sqlite equivalent of row_number() over ( partition by ...?

sqlite equivalent of row_number() over ( partition by ...? - sql

I'd like to know if it's possible to do the following using a single sqlite statement:
My table looks something like this:
|AnId|UserId|SomeDate|SomeData|
|123 |A |1/1/2010|aadsljvs|
| 87 |A |2/9/2010|asda fas|
|193 |A |2/4/2010|aadsljvs|
|927 |A |7/3/2010|aadsasdf|
|816 |B |1/1/2010|aa32973v|
|109 |B |7/5/2010|aaasfd10|
| 39 |B |1/3/2010|66699327|
...
Each row has a unique id, a user id, a datetime value, and some other data.
I'd like to delete records so I keep the latest 10 records per user, based on SomeDate.
In sql server I'd use something like this:
delete d
from data d
inner join (
select UserId
, AnId
, row_number() over ( partition by UserId order by SomeDate desc )
as RowNum
from data
) ranked on d.AnId = ranked.AnId
where ranked.RowNum > 10
Is there a way to do this in sqlite? The edge case where there are several records with the same SomeDate isn't a particular worry, e.g. if I keep all those records that'd be fine.

I know this question is old, but the following SQLite statement will do what Rory was originally asking for in one statement - Delete all records for a given UserId that are not the 10 most recent records for that UserId (based on SomeDate).
DELETE FROM data
WHERE AnId IN (SELECT AnId
FROM data AS d
WHERE d.UserId = data.UserId
ORDER BY SomeDate DESC
LIMIT -1 OFFSET 10)

I needed to fetch the second row for each "object" in a table with a 1 to many relationship to the "object" table.
Usually in SQL this will be done using ROW_NUMBER() OVER(PARTITION BY object_id ORDER BY primary_id DESC)
In Sqlite I had to come up with this voodoo sub-query to get the same result
SELECT object_id, MAX(_id)
FROM (SELECT object_id, _id
FROM aTable
EXCEPT
SELECT object_id, MAX(_id)
FROM aTable
GROUP BY object_id)
GROUP BY object_id;
Note: The _id is the primary key of aTable and the object table has a 1 to many relationship with the queried table

If you already haven't got the answer.
If it's one table, then you don't need any joins. You can just use:
Delete From data
where AnId not in (Select AnId
from data
Order by SomeDate DESC
Limit 10)

As of Sqlite 3.25 window functions are supported.
See https://www.sqlite.org/windowfunctions.html for details.

This might be prohibitively expensive (perhaps only do it when a user inserts a new record?) but how about this:
for user in users:
user-records = select * from records where user=user
if user-records.length > 10:
delete from records where user=user and date<user-records[10]
(in a mix of SQL and pseudocode)

Related

I need a sql query to group by name but return other fields based on the most recent entry

I'm querying a single table called PhoneCallNotes. The caller FirstName, LastName and DOB are recorded for each call as well as many other fields including a unique ID for the call (PhoneNoteID) but no unique ID for the caller.
My requirement is to return a list of callers with duplicates removed along with the PhoneNoteID, etc from their most recent entry.
I can get the list of users I want using a Group By on name, DOB and Max(CreatedOn) but how do I include uniqueID (of the most recent entry in the results?)
select O.CallerFName,O.CallerLName,O.CallerDOB,Max(O.CreatedOn)
from [dbo].[PhoneCallNotes] as O
where O.CallerLName like 'Public'
group by O.CallerFName,O.CallerLName,O.CallerDOB order by Max(O.CreatedOn)
Results:
John Public 4/4/2001 4/6/12 16:42
Joe Public 4/12/1988 4/6/12 16:52
John Public 1/2/1950 4/6/12 17:01
Thanks

You can also write what Andrey wrote somewhat more compactly if you select TOP (1) WITH TIES and put the ROW_NUMBER() expression in the ORDER BY clause:
SELECT TOP (1) WITH TIES
CallerFName,
CallerLName,
CallerDOB,
CreatedOn,
PhoneNoteID
FROM [dbo].[PhoneCallNotes]
WHERE CallerLName = 'Public'
ORDER BY ROW_NUMBER() OVER(
PARTITION BY CallerFName, CallerLName, CallerDOB
ORDER BY CreatedOn DESC
)
(By the way, there's no reason to use LIKE for a simple string comparison.)

Try something like that:
;WITH CTE AS (
SELECT
O.CallerFName,
O.CallerLName,
O.CallerDOB,
O.CreatedOn,
PhoneNoteID,
ROW_NUMBER() OVER(PARTITION BY O.CallerFName, O.CallerLName, O.CallerDOB ORDER BY O.CreatedOn DESC) AS rn
FROM [dbo].[PhoneCallNotes] AS O
WHERE
O.CallerLName LIKE 'Public'
)
SELECT
CallerFName,
CallerLName,
CallerDOB,
CreatedOn,
PhoneNoteID
FROM CTE
WHERE rn = 1
ORDER BY
CreatedOn

Assuming that the set of [FirstName, LastName, DateOfBirth] are unique (#shudder#), I believe the following should work, on pretty much every major RDBMS:
SELECT a.callerFName, a.callerLName, a.callerDOB, a.createdOn, a.phoneNoteId
FROM phoneCallNotes as a
LEFT JOIN phoneCallNotes as b
ON b.callerFName = a.callerFName
AND b.callerLName = a.callerLName
AND b.callerDOB = a.callerDOB
AND b.createdOn > a.createdOn
WHERE a.callerLName LIKE 'Public'
AND b.phoneNoteId IS NULL
Basically, the query is looking for every phone-call-note for a particular name/dob combination, where there is not a more-recent row (b is null). If you have two rows with the same create time, you'll get duplicate rows, though.

Remove duplicates (1 to many) or write a subquery that solves my problem

Referring to the diagram below the records table has unique Records. Each record is updated, via comments through an Update Table. When I join the two I get lots of duplicates.
How to remove duplicates? Group By does not work for me as I have more than 10 fields in select query and some of them are functions.
Write a sub query which pulls the last updates in the Update table for each record that is updated in a particular month. Joining with this sub query will solve my problem.
Thanks!
Edit
Table structure that is of interest is
create table Records(
recordID int,
90more_fields various
)
create table Updates(
update_id int,
record_id int,
comment text,
byUser varchar(25),
datecreate datetime
)

Here's one way.
SELECT * /*But list columns explicitly*/
FROM Orange o
CROSS APPLY (SELECT TOP 1 *
FROM Blue b
WHERE b.datecreate >= '20110901'
AND b.datecreate < '20111001'
AND o.RecordID = b.Record_ID2
ORDER BY b.datecreate DESC) b

Based on the limited information available...
WITH cteLastUpdate AS (
SELECT Record_ID2, UpdateDateTime,
ROW_NUMBER() OVER(PARTITION BY Record_ID2 ORDER BY UpdateDateTime DESC) AS RowNUM
FROM BlueTable
/* Add WHERE clause if needed to restrict date range */
)
SELECT *
FROM cteLastUpdate lu
INNER JOIN OrangeTable o
ON lu.Record_ID2 = o.RecordID
WHERE lu.RowNum = 1

Last updates per record and month:
SELECT *
FROM UPDATES outerUpd
WHERE exists
(
-- Magic part
SELECT 1
FROM UPDATES innerUpd
WHERE innerUpd.RecordId = outerUpd.RecordId
GROUP BY RecordId
, date_part('year', innerUpd.datecolumn)
, date_part('month', innerUpd.datecolumn)
HAVING max(innerUpd.datecolumn) = outerUpd.datecolumn
)
(Works on PostgreSQL, date_part is different in other RDBMS)

How can I select adjacent rows to an arbitrary row (in sql or postgresql)?

I want to select some rows based on certain criteria, and then take one entry from that set and the 5 rows before it and after it.
Now, I can do this numerically if there is a primary key on the table, (e.g. primary keys that are numerically 5 less than the target row's key and 5 more than the target row's key).
So select the row with the primary key of 7 and the nearby rows:
select primary_key from table where primary_key > (7-5) order by primary_key limit 11;
2
3
4
5
6
-=7=-
8
9
10
11
12
But if I select only certain rows to begin with, I lose that numeric method of using primary keys (and that was assuming the keys didn't have any gaps in their order anyway), and need another way to get the closest rows before and after a certain targeted row.
The primary key output of such a select might look more random and thus less succeptable to mathematical locating (since some results would be filtered, out, e.g. with a where active=1):
select primary_key from table where primary_key > (34-5)
order by primary_key where active=1 limit 11;
30
-=34=-
80
83
100
113
125
126
127
128
129
Note how due to the gaps in the primary keys caused by the example where condition (for example becaseu there are many inactive items), I'm no longer getting the closest 5 above and 5 below, instead I'm getting the closest 1 below and the closest 9 above, instead.

There's a lot of ways to do it if you run two queries with a programming language, but here's one way to do it in one SQL query:
(SELECT * FROM table WHERE id >= 34 AND active = 1 ORDER BY id ASC LIMIT 6)
UNION
(SELECT * FROM table WHERE id < 34 AND active = 1 ORDER BY id DESC LIMIT 5)
ORDER BY id ASC
This would return the 5 rows above, the target row, and 5 rows below.

Here's another way to do it with analytic functions lead and lag. It would be nice if we could use analytic functions in the WHERE clause. So instead you need to use subqueries or CTE's. Here's an example that will work with the pagila sample database.
WITH base AS (
SELECT lag(customer_id, 5) OVER (ORDER BY customer_id) lag,
lead(customer_id, 5) OVER (ORDER BY customer_id) lead,
c.*
FROM customer c
WHERE c.active = 1
AND c.last_name LIKE 'B%'
)
SELECT base.* FROM base
JOIN (
-- Select the center row, coalesce so it still works if there aren't
-- 5 rows in front or behind
SELECT COALESCE(lag, 0) AS lag, COALESCE(lead, 99999) AS lead
FROM base WHERE customer_id = 280
) sub ON base.customer_id BETWEEN sub.lag AND sub.lead
The problem with sgriffinusa's solution is that you don't know which row_number your center row will end up being. He assumed it will be row 30.

For similar query I use analytic functions without CTE. Something like:
select ...,
LEAD(gm.id) OVER (ORDER BY Cit DESC) as leadId,
LEAD(gm.id, 2) OVER (ORDER BY Cit DESC) as leadId2,
LAG(gm.id) OVER (ORDER BY Cit DESC) as lagId,
LAG(gm.id, 2) OVER (ORDER BY Cit DESC) as lagId2
...
where id = 25912
or leadId = 25912 or leadId2 = 25912
or lagId = 25912 or lagId2 = 25912
such query works more faster for me than CTE with join (answer from Scott Bailey). But of course less elegant

You could do this utilizing row_number() (available as of 8.4). This may not be the correct syntax (not familiar with postgresql), but hopefully the idea will be illustrated:
SELECT *
FROM (SELECT ROW_NUMBER() OVER (ORDER BY primary_key) AS r, *
FROM table
WHERE active=1) t
WHERE 25 < r and r < 35
This will generate a first column having sequential numbers. You can use this to identify the single row and the rows above and below it.

If you wanted to do it in a 'relationally pure' way, you could write a query that sorted and numbered the rows. Like:
select (
select count(*) from employees b
where b.name < a.name
) as idx, name
from employees a
order by name
Then use that as a common table expression. Write a select which filters it down to the rows you're interested in, then join it back onto itself using a criterion that the index of the right-hand copy of the table is no more than k larger or smaller than the index of the row on the left. Project over just the rows on the right. Like:
with numbered_emps as (
select (
select count(*)
from employees b
where b.name < a.name
) as idx, name
from employees a
order by name
)
select b.*
from numbered_emps a, numbered_emps b
where a.name like '% Smith' -- this is your main selection criterion
and ((b.idx - a.idx) between -5 and 5) -- this is your adjacency fuzzy-join criterion
What could be simpler!
I'd imagine the row-number based solutions will be faster, though.

MySQL get row position in ORDER BY

With the following MySQL table:
+-----------------------------+
+ id INT UNSIGNED +
+ name VARCHAR(100) +
+-----------------------------+
How can I select a single row AND its position amongst the other rows in the table, when sorted by name ASC. So if the table data looks like this, when sorted by name:
+-----------------------------+
+ id | name +
+-----------------------------+
+ 5 | Alpha +
+ 7 | Beta +
+ 3 | Delta +
+ ..... +
+ 1 | Zed +
+-----------------------------+
How could I select the Beta row getting the current position of that row? The result set I'm looking for would be something like this:
+-----------------------------+
+ id | position | name +
+-----------------------------+
+ 7 | 2 | Beta +
+-----------------------------+
I can do a simple SELECT * FROM tbl ORDER BY name ASC then enumerate the rows in PHP, but it seems wasteful to load a potentially large resultset just for a single row.

Use this:
SELECT x.id,
x.position,
x.name
FROM (SELECT t.id,
t.name,
#rownum := #rownum + 1 AS position
FROM TABLE t
JOIN (SELECT #rownum := 0) r
ORDER BY t.name) x
WHERE x.name = 'Beta'
...to get a unique position value. This:
SELECT t.id,
(SELECT COUNT(*)
FROM TABLE x
WHERE x.name <= t.name) AS position,
t.name
FROM TABLE t
WHERE t.name = 'Beta'
...will give ties the same value. IE: If there are two values at second place, they'll both have a position of 2 when the first query will give a position of 2 to one of them, and 3 to the other...

This is the only way that I can think of:
SELECT `id`,
(SELECT COUNT(*) FROM `table` WHERE `name` <= 'Beta') AS `position`,
`name`
FROM `table`
WHERE `name` = 'Beta'

If the query is simple and the size of returned result set is potentially large, then you may try to split it into two queries.
The first query with a narrow-down filtering criteria just to retrieve data of that row, and the second query uses COUNT with WHERE clause to calculate the position.
For example in your case
Query 1:
SELECT * FROM tbl WHERE name = 'Beta'
Query 2:
SELECT COUNT(1) FROM tbl WHERE name >= 'Beta'
We use this approach in a table with 2M record and this is way more scalable than OMG Ponies's approach.

The other answers seem too complicated for me.
Here comes an easy example, let's say you have a table with columns:
userid | points
and you want to sort the userids by points and get the row position (the "ranking" of the user), then you use:
SET #row_number = 0;
SELECT
(#row_number:=#row_number + 1) AS num, userid, points
FROM
ourtable
ORDER BY points DESC
num gives you the row postion (ranking).
If you have MySQL 8.0+ then you might want to use ROW_NUMBER()

The position of a row in the table represents how many rows are "better" than the targeted row.
So, you must count those rows.
SELECT COUNT(*)+1 FROM table WHERE name<'Beta'
In case of a tie, the highest position is returned.
If you add another row with same name of "Beta" after the existing "Beta" row, then the position returned would be still 2, as they would share same place in the classification.
Hope this helps people that will search for something similar in the future, as I believe that the question owner already solved his issue.

I've got a very very similar issue, that's why I won't ask the same question, but I will share here what did I do, I had to use also a group by, and order by AVG.
There are students, with signatures and socore, and I had to rank them (in other words, I first calc the AVG, then order them in DESC, and then finally I needed to add the position (rank for me), So I did something Very similar as the best answer here, with a little changes that adjust to my problem):
I put finally the position (rank for me) column in the external SELECT
SET #rank=0;
SELECT #rank := #rank + 1 AS ranking, t.avg, t.name
FROM(SELECT avg(students_signatures.score) as avg, students.name as name
FROM alumnos_materia
JOIN (SELECT #rownum := 0) r
left JOIN students ON students.id=students_signatures.id_student
GROUP BY students.name order by avg DESC) t

I was going through the accepted answer and it seemed bit complicated so here is the simplified version of it.
SELECT t,COUNT(*) AS position FROM t
WHERE name <= 'search string' ORDER BY name

I have similar types of problem where I require rank(Index) of table order by votes desc. The following works fine with for me.
Select *, ROW_NUMBER() OVER(ORDER BY votes DESC) as "rank"
From "category_model"
where ("model_type" = ? and "category_id" = ?)

may be what you need is with add syntax
LIMIT
so use
SELECT * FROM tbl ORDER BY name ASC LIMIT 1
if you just need one row..

ORDER BY the IN value list

I have a simple SQL query in PostgreSQL 8.3 that grabs a bunch of comments. I provide a sorted list of values to the IN construct in the WHERE clause:
SELECT * FROM comments WHERE (comments.id IN (1,3,2,4));
This returns comments in an arbitrary order which in my happens to be ids like 1,2,3,4.
I want the resulting rows sorted like the list in the IN construct: (1,3,2,4).
How to achieve that?

You can do it quite easily with (introduced in PostgreSQL 8.2) VALUES (), ().
Syntax will be like this:
select c.*
from comments c
join (
values
(1,1),
(3,2),
(2,3),
(4,4)
) as x (id, ordering) on c.id = x.id
order by x.ordering

In Postgres 9.4 or later, this is simplest and fastest:
SELECT c.*
FROM comments c
JOIN unnest('{1,3,2,4}'::int[]) WITH ORDINALITY t(id, ord) USING (id)
ORDER BY t.ord;
WITH ORDINALITY was introduced with in Postgres 9.4.
No need for a subquery, we can use the set-returning function like a table directly. (A.k.a. "table-function".)
A string literal to hand in the array instead of an ARRAY constructor may be easier to implement with some clients.
For convenience (optionally), copy the column name we are joining to ("id" in the example), so we can join with a short USING clause to only get a single instance of the join column in the result.
Works with any input type. If your key column is of type text, provide something like '{foo,bar,baz}'::text[].
Detailed explanation:
PostgreSQL unnest() with element number

Just because it is so difficult to find and it has to be spread: in mySQL this can be done much simpler, but I don't know if it works in other SQL.
SELECT * FROM `comments`
WHERE `comments`.`id` IN ('12','5','3','17')
ORDER BY FIELD(`comments`.`id`,'12','5','3','17')

With Postgres 9.4 this can be done a bit shorter:
select c.*
from comments c
join (
select *
from unnest(array[43,47,42]) with ordinality
) as x (id, ordering) on c.id = x.id
order by x.ordering;
Or a bit more compact without a derived table:
select c.*
from comments c
join unnest(array[43,47,42]) with ordinality as x (id, ordering)
on c.id = x.id
order by x.ordering
Removing the need to manually assign/maintain a position to each value.
With Postgres 9.6 this can be done using array_position():
with x (id_list) as (
values (array[42,48,43])
)
select c.*
from comments c, x
where id = any (x.id_list)
order by array_position(x.id_list, c.id);
The CTE is used so that the list of values only needs to be specified once. If that is not important this can also be written as:
select c.*
from comments c
where id in (42,48,43)
order by array_position(array[42,48,43], c.id);

I think this way is better :
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY id=1 DESC, id=3 DESC, id=2 DESC, id=4 DESC

Another way to do it in Postgres would be to use the idx function.
SELECT *
FROM comments
ORDER BY idx(array[1,3,2,4], comments.id)
Don't forget to create the idx function first, as described here: http://wiki.postgresql.org/wiki/Array_Index

In Postgresql:
select *
from comments
where id in (1,3,2,4)
order by position(id::text in '1,3,2,4')

On researching this some more I found this solution:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY CASE "comments"."id"
WHEN 1 THEN 1
WHEN 3 THEN 2
WHEN 2 THEN 3
WHEN 4 THEN 4
END
However this seems rather verbose and might have performance issues with large datasets.
Can anyone comment on these issues?

To do this, I think you should probably have an additional "ORDER" table which defines the mapping of IDs to order (effectively doing what your response to your own question said), which you can then use as an additional column on your select which you can then sort on.
In that way, you explicitly describe the ordering you desire in the database, where it should be.

sans SEQUENCE, works only on 8.4:
select * from comments c
join
(
select id, row_number() over() as id_sorter
from (select unnest(ARRAY[1,3,2,4]) as id) as y
) x on x.id = c.id
order by x.id_sorter

SELECT * FROM "comments" JOIN (
SELECT 1 as "id",1 as "order" UNION ALL
SELECT 3,2 UNION ALL SELECT 2,3 UNION ALL SELECT 4,4
) j ON "comments"."id" = j."id" ORDER BY j.ORDER
or if you prefer evil over good:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY POSITION(','+"comments"."id"+',' IN ',1,3,2,4,')

And here's another solution that works and uses a constant table (http://www.postgresql.org/docs/8.3/interactive/sql-values.html):
SELECT * FROM comments AS c,
(VALUES (1,1),(3,2),(2,3),(4,4) ) AS t (ord_id,ord)
WHERE (c.id IN (1,3,2,4)) AND (c.id = t.ord_id)
ORDER BY ord
But again I'm not sure that this is performant.
I've got a bunch of answers now. Can I get some voting and comments so I know which is the winner!
Thanks All :-)

create sequence serial start 1;
select * from comments c
join (select unnest(ARRAY[1,3,2,4]) as id, nextval('serial') as id_sorter) x
on x.id = c.id
order by x.id_sorter;
drop sequence serial;
[EDIT]
unnest is not yet built-in in 8.3, but you can create one yourself(the beauty of any*):
create function unnest(anyarray) returns setof anyelement
language sql as
$$
select $1[i] from generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
that function can work in any type:
select unnest(array['John','Paul','George','Ringo']) as beatle
select unnest(array[1,3,2,4]) as id

Slight improvement over the version that uses a sequence I think:
CREATE OR REPLACE FUNCTION in_sort(anyarray, out id anyelement, out ordinal int)
LANGUAGE SQL AS
$$
SELECT $1[i], i FROM generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
SELECT
*
FROM
comments c
INNER JOIN (SELECT * FROM in_sort(ARRAY[1,3,2,4])) AS in_sort
USING (id)
ORDER BY in_sort.ordinal;

select * from comments where comments.id in
(select unnest(ids) from bbs where id=19795)
order by array_position((select ids from bbs where id=19795),comments.id)
here, [bbs] is the main table that has a field called ids,
and, ids is the array that store the comments.id .
passed in postgresql 9.6

Lets get a visual impression about what was already said. For example you have a table with some tasks:
SELECT a.id,a.status,a.description FROM minicloud_tasks as a ORDER BY random();
id | status | description
----+------------+------------------
4 | processing | work on postgres
6 | deleted | need some rest
3 | pending | garden party
5 | completed | work on html
And you want to order the list of tasks by its status.
The status is a list of string values:
(processing, pending, completed, deleted)
The trick is to give each status value an interger and order the list numerical:
SELECT a.id,a.status,a.description FROM minicloud_tasks AS a
JOIN (
VALUES ('processing', 1), ('pending', 2), ('completed', 3), ('deleted', 4)
) AS b (status, id) ON (a.status = b.status)
ORDER BY b.id ASC;
Which leads to:
id | status | description
----+------------+------------------
4 | processing | work on postgres
3 | pending | garden party
5 | completed | work on html
6 | deleted | need some rest
Credit #user80168

I agree with all other posters that say "don't do that" or "SQL isn't good at that". If you want to sort by some facet of comments then add another integer column to one of your tables to hold your sort criteria and sort by that value. eg "ORDER BY comments.sort DESC " If you want to sort these in a different order every time then... SQL won't be for you in this case.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

sqlite equivalent of row_number() over ( partition by ...? - sql

If you already haven't got the answer. If it's one table, then you don't need any joins. You can just use: Delete From data where AnId not in (Select AnId from data Order by SomeDate DESC Limit 10)

As of Sqlite 3.25 window functions are supported. See https://www.sqlite.org/windowfunctions.html for details.

Related

I need a sql query to group by name but return other fields based on the most recent entry

Remove duplicates (1 to many) or write a subquery that solves my problem

How can I select adjacent rows to an arbitrary row (in sql or postgresql)?

MySQL get row position in ORDER BY

ORDER BY the IN value list

Categories

Resources