select top n record from each group sqlite - sql

I am trying to select top 2 records from a database table result that looks like this
SubjectId | StudentId | Levelid | total
------------------------------------------
1 | 1 | 1 | 89
1 | 2 | 1 | 77
1 | 3 | 1 | 61
2 | 4 | 1 | 60
2 | 5 | 1 | 55
2 | 6 | 1 | 45
i tried this query
SELECT rv.subjectid,
rv.total,
rv.Studentid,
rv.levelid
FROM ResultView rv
LEFT JOIN ResultView rv2
ON ( rv.subjectid = rv2.subjectid
AND
rv.total <= rv2.total )
GROUP BY rv.subjectid,
rv.total,
rv.Studentid
HAVING COUNT( * ) <= 2
order by rv.subjectid desc
but some subjects like where missing, i even tried the suggestiong frm the following link
How to select the first N rows of each group?
but i get more that two for each subjectid
what am i doing wrong?

You could use a correlated subquery:
select *
from ResultView rv1
where SubjectId || '-' || StudentId || '-' || LevelId in
(
select SubjectId || '-' || StudentId || '-' || LevelId
from ResultView rv2
where SubjectID = rv1.SubjectID
order by
total desc
limit 2
)
This query constructs a single-column primary key by concatenating three columns. If you have a real primary key (like ResultViewID) you can substitute that for SubjectId || '-' || StudentId || '-' || LevelId.
Example at SQL Fiddle.

I hope I'm understanding your question correctly. Let me know if this is correct:
I recreated your table:
CREATE TABLE stack (
SubjectId INTEGER(10),
StudentId INTEGER(10),
Levelid INTEGER(10),
total INTEGER(10)
)
;
Inserted values
INSERT INTO stack VALUES
(1,1,1,89),
(1,2,1,77),
(1,3,1,61),
(2,4,1,60),
(2,5,1,55),
(2,6,1,45)
;
If you're trying to get the top group by Levelid (orderd by total field, assuming StudentID as primary key):
SELECT *
FROM stack AS a
WHERE a.StudentID IN (
SELECT b.StudentID
FROM stack AS b
WHERE a.levelid = b.levelid
ORDER BY b.total DESC
LIMIT 2
)
;
Yields this result:
SubjectId | StudentId | Levelid | total
1 | 1 | 1 | 89
1 | 2 | 1 | 77
Example of top 2 by SubjectId, ordered by total:
SELECT *
FROM stack AS a
WHERE a.StudentID IN (
SELECT b.StudentID
FROM stack AS b
WHERE a.subjectID = b.subjectID
ORDER BY b.total DESC
LIMIT 2
)
;
Result:
SubjectId | StudentId | Levelid | total
1 | 1 | 1 | 89
1 | 2 | 1 | 77
2 | 4 | 1 | 60
2 | 5 | 1 | 55
I hope that was the answer you were looking for.

ROW_NUMBER window function
SQLite now supports window functions, so the exact same code that works for PostgreSQL at Grouped LIMIT in PostgreSQL: show the first N rows for each group? now also works for SQLite.
This could be potentially faster than the other answers so far as it does not run a correlated subquery.
Supposing you want to get the 2 highest total rows for each StudentID:
SELECT *
FROM (
SELECT
ROW_NUMBER() OVER (
PARTITION BY "StudentID"
ORDER BY "total" DESC
) AS "rnk",
*
FROM "mytable"
) sub
WHERE
"sub"."rnk" <= 2
ORDER BY
"sub"."StudentID" ASC,
"sub"."total" DESC
Tested on SQLite 3.34, PostgreSQL 14.3. GitHub upstream

Related

How to make sure the sql result is continued range?

I have table like:
id | low_number | high_number
-------------------------------
1 | 12 | 32
-------------------------------
2 | 13 | 33
-------------------------------
3 | 15 | 36
-------------------------------
4 | 33 | 50
-------------------------------
5 | 35 | 52
...
-------------------------------
17 | 52 | 80
I want to get result like:
id | low_number | high_number
-------------------------------
1 | 12 | 32
-------------------------------
4 | 33 | 50
-------------------------------
17 | 52 | 80
that is because the low_number bigger than the pervious row high_number.
How to write sql to get these result? I use postgresql
This seems like a recursive CTE problem. You want to choose the first row (by id) and then choose the next row based on that.
The idea is to cycle through the rows, one at a time. Then when the condition is met, transition to that row. And so on.
As a query, this looks like:
with recursive tt as (
select id, low_number, high_number, row_number() over (order by id) as seqnum
from t
),
cte as (
select id, low_number, high_number, seqnum, true as is_change, id as grouping_id
from tt
where seqnum = 1
union all
select tt.id, tt.low_number, tt.high_number, tt.seqnum, tt.low_number > t.high_number,
(case when tt.low_number > t.high_number then tt.id else cte.grouping_id end)
from cte join
t
on cte.grouping_id = t.id join
tt
on tt.seqnum = cte.seqnum + 1
)
select *
from cte
where is_change;
Here is a db<>fiddle.
Use the window function LAG() to get a value of a previous row, e.g.
WITH j AS (
SELECT
id,low_number,high_number,
LAG(high_number) OVER (ORDER BY id) AS prev_high_number
FROM t)
SELECT id,low_number,high_number FROM j
WHERE low_number > prev_high_number OR prev_high_number IS NULL;
Demo: db<>fiddle

Order By Id and Limit Offset By Id from a table

I have an issue similar to the following query:
select name, number, id
from tableName
order by id
limit 10 offset 5
But in this case I only take the 10 elements from the group with offset 5
Is there a way to set limit and offset by id?
For example if I have a set:
|------------------------------------|---|---------------------------------------|
| Ana | 1 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jana | 2 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jan | 3 | 589d0011-ef54-4708-a64a-f85228149651 |
| Joe | 2 | 64ed0011-ef54-4708-a64a-f85228149651 |
and if I have skip 1 I should get
|------------------------------------|---|---------------------------------------|
| Jana | 2 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jan | 3 | 589d0011-ef54-4708-a64a-f85228149651 |
I think that you want to filter by row_number():
select name, number, id
from (
select t.*, row_number() over(partition by name order by id) rn
from mytable t
) t
where
rn >= :number_of_records_per_group_to_skip
and rn < :number_of_records_per_group_to_skip + :number_of_records_per_group_to_keep
The query ranks records by id withing groups of records having the same name, and then filters using two parameters:
:number_of_records_per_group_to_skip: how many records per group should be skipped
:number_of_records_per_group_to_skip: how many records per group should be kept (after skipping :number_of_records_per_group_to_skip records)
This might not be the answer you are looking for but it gives you the results your example shows:
select name, number, id
from (
select * from tableName
order by id
limit 3 offset 0
) d
where id > 1;
Best regards,
Bjarni

How to get 50% records from a table in SQL Server?

Suppose I have a table with 1000 rows and I want 50% of it in the output. How can I do that? Does it have any in-built function?
Use :
SELECT
TOP 50 PERCENT *
FROM
Table1;
with Row_number
SELECT
TOP 50 PERCENT Row_Number() over (order by Column1) ,*
FROM
Table1;
Note: Row_number should have a over clause with order by column or partition by columns
The top syntax supports a percent modifier, which you can use:
SELECT TOP 50 PERCENT *
FROM mytable
Here is the solution:
select top 50 percent *
from TableName
In TSQL you can use TOP n PERCENT but you should also order the output so that the "percentage of" is also specified, otherwise the result is indeterminate. By way of a simple example if rows are unordered (in this case the first insert is 6 not 1):
CREATE TABLE mytable (id INT)
INSERT INTO mytable (id)
VALUES
(6)
, (7)
, (8)
, (9)
, (10)
, (1)
, (2)
, (3)
, (4)
, (5) ;
This, if we simply ask for top 50 percent the output is
select top 50 percent
id
from mytable
| id |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
But if we use an order by clause then the result is more meaningful.
select top 50 percent
id
from mytable
order by id
| id |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
It was also asked if a similar result could be determined using row_number(), so here is a method
select
id
from (
select
id
, count(*) over(partition by (select 1)) all_count
, row_number() over(order by id) rn
from mytable
) d
where rn <= all_count / 2
| id |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
db<>fiddle here
SELECT * FROM table
LIMIT (SELECT COUNT(*)/2 FROM table)

Partitioning function for continuous sequences

There is a table of the following structure:
CREATE TABLE history
(
pk serial NOT NULL,
"from" integer NOT NULL,
"to" integer NOT NULL,
entity_key text NOT NULL,
data text NOT NULL,
CONSTRAINT history_pkey PRIMARY KEY (pk)
);
The pk is a primary key, from and to define a position in the sequence and the sequence itself for a given entity identified by entity_key. So the entity has one sequence of 2 rows in case if the first row has the from = 1; to = 2 and the second one has from = 2; to = 3. So the point here is that the to of the previous row matches the from of the next one.
The order to determine "next"/"previous" row is defined by pk which grows monotonously (since it's a SERIAL).
The sequence does not have to start with 1 and the to - from does not necessary 1 always. So it can be from = 1; to = 10. What matters is that the "next" row in the sequence matches the to exactly.
Sample dataset:
pk | from | to | entity_key | data
----+--------+------+--------------+-------
1 | 1 | 2 | 42 | foo
2 | 2 | 3 | 42 | bar
3 | 3 | 4 | 42 | baz
4 | 10 | 11 | 42 | another foo
5 | 11 | 12 | 42 | another baz
6 | 1 | 2 | 111 | one one one
7 | 2 | 3 | 111 | one one one two
8 | 3 | 4 | 111 | one one one three
And what I cannot realize is how to partition by "sequences" here so that I could apply window functions to the group that represents a single "sequence".
Let's say I want to use the row_number() function and would like to get the following result:
pk | row_number | entity_key
----+-------------+------------
1 | 1 | 42
2 | 2 | 42
3 | 3 | 42
4 | 1 | 42
5 | 2 | 42
6 | 1 | 111
7 | 2 | 111
8 | 3 | 111
For convenience I created an SQLFiddle with initial seed: http://sqlfiddle.com/#!15/e7c1c
PS: It's not the "give me the codez" question, I made my own research and I just out of ideas how to partition.
It's obvious that I need to LEFT JOIN with the next.from = curr.to, but then it's still not clear how to reset the partition on next.from IS NULL.
PS: It will be a 100 points bounty for the most elegant query that provides the requested result
PPS: the desired solution should be an SQL query not pgsql due to some other limitations that are out of scope of this question.
I don’t know if it counts as “elegant,” but I think this will do what you want:
with Lagged as (
select
pk,
case when lag("to",1) over (order by pk) is distinct from "from" then 1 else 0 end as starts,
entity_key
from history
), LaggedGroups as (
select
pk,
sum(starts) over (order by pk) as groups,
entity_key
from Lagged
)
select
pk,
row_number() over (
partition by groups
order by pk
) as "row_number",
entity_key
from LaggedGroups
Just for fun & completeness: a recursive solution to reconstruct the (doubly) linked lists of records. [ this will not be the fastest solution ]
NOTE: I commented out the ascending pk condition(s) since they are not needed for the connection logic.
WITH RECURSIVE zzz AS (
SELECT h0.pk
, h0."to" AS next
, h0.entity_key AS ek
, 1::integer AS rnk
FROM history h0
WHERE NOT EXISTS (
SELECT * FROM history nx
WHERE nx.entity_key = h0.entity_key
AND nx."to" = h0."from"
-- AND nx.pk > h0.pk
)
UNION ALL
SELECT h1.pk
, h1."to" AS next
, h1.entity_key AS ek
, 1+zzz.rnk AS rnk
FROM zzz
JOIN history h1
ON h1.entity_key = zzz.ek
AND h1."from" = zzz.next
-- AND h1.pk > zzz.pk
)
SELECT * FROM zzz
ORDER BY ek,pk
;
You can use generate_series() to generate all the rows between the two values. Then you can use the difference of row numbers on that:
select pk, "from", "to",
row_number() over (partition by entity_key, min(grp) order by pk) as row_number
from (select h.*,
(row_number() over (partition by entity_key order by ind) -
ind) as grp
from (select h.*, generate_series("from", "to" - 1) as ind
from history h
) h
) h
group by pk, "from", "to", entity_key
Because you specify that the difference is between 1 and 10, this might actually not have such bad performance.
Unfortunately, your SQL Fiddle isn't working right now, so I can't test it.
Well,
this not exactly one SQL query but:
select a.pk as PK, a.entity_key as ENTITY_KEY, b.pk as BPK, 0 as Seq into #tmp
from history a left join history b on a."to" = b."from" and a.pk = b.pk-1
declare #seq int
select #seq = 1
update #tmp set Seq = case when (BPK is null) then #seq-1 else #seq end,
#seq = case when (BPK is null) then #seq+1 else #seq end
select pk, entity_key, ROW_NUMBER() over (PARTITION by entity_key, seq order by pk asc)
from #tmp order by pk
This is in SQL Server 2008

Select last changed row in sub-query

I have a table product:
id | owner_id | last_activity | box_id
------------------------------------
1 | 2 | 12/19/2014 | null
2 | 2 | 12/13/2014 | null
3 | 2 | 08/11/2014 | null
4 | 2 | 12/11/2014 | 99
5 | 2 | null | 99
6 | 2 | 12/15/2014 | 99
7 | 2 | null | 105
8 | 2 | null | 105
9 | 2 | null | 105
The only variable that I have is owner_id.
I need to select all products of a user, but if the product is in a box then only latest one should be selected.
Sample output for owner = 2 is following:
id | owner_id | last_activity | box_id
------------------------------------
1 | 2 | 12/19/2014 | null
2 | 2 | 12/13/2014 | null
3 | 2 | 08/11/2014 | null
6 | 2 | 12/15/2014 | 99
7 | 2 | null | 105
I'm not able to find a way to select the latest product from a box.
My current query, which does not return correct value, but can be executed:
SELECT p.* FROM product p
WHERE p.owner_id = 2
AND (
p.box IS NULL
OR (
p.box IS NOT NULL
AND
p.id = ( SELECT MAX(pp.id) FROM product pp
WHERE pp.box_id = p.box_id )
)
I tried with dates:
SELECT p.* FROM product p
WHERE p.owner_id = 2
AND (
p.box IS NULL
OR (
p.box IS NOT NULL
AND
p.id = ( SELECT * FROM (
SELECT pp.id FROM product pp
WHERE pp.box_id = p.box_id
ORDER BY last_activity desc
) WHERE rownum = 1
)
)
Which gives error: p.box_id is undefined as it's inside 2nd subquery.
Do you have any ideas how can I solve it?
The ROW_NUMBER analytical function might help with such queries:
SELECT "owner_id", "id", "box_id", "last_activity" FROM
(
SELECT "owner_id", "id", "box_id", "last_activity",
ROW_NUMBER()
OVER (PARTITION BY "box_id" ORDER BY "last_activity" DESC NULLS LAST) rn
-- ^^^^^^^^^^^^^^^
-- descending order, reject nulls after not null values
-- (this is the default, but making it
-- explicit here for self-documentation
-- purpose)
FROM T
WHERE "owner_id" = 2
) V
WHERE rn = 1 or "box_id" IS NULL
ORDER BY "id" -- <-- probably not necessary, but matches your example
See http://sqlfiddle.com/#!4/db775/8
there can be nulls as a value. If there are nulls in all products inside a box, then MIN(id) should be returned
Even if is is probably not a good idea to rely on id to order things is you think you need that, you will have to change the ORDER BY clause to:
... ORDER BY "last_activity" DESC NULLS LAST, "id" DESC
-- ^^^^^^^^^^^
Use exists
SELECT
p.*
FROM
product p
WHERE
p.owner_id = 2 AND
( p.box IS NULL OR
(
p.box IS NOT NULL AND
NOT EXISTS
(
SELECT
pp.id
FROM
product pp
WHERE
pp.box_id = p.box_id AND
pp.last_activity > p.last_activity
)
)
)
You can use union to first get all rows where box_is null and than fetch rows with max id and date where box_id is not null:
SELECT * FROM
(
SELECT id,owner_id,last_activity,box_id FROM product WHERE owner_id = 2 AND box_id IS NULL
UNION
SELECT MAX(id),owner_id,MAX(last_activity),box_id FROM product WHERE owner_id = 2 AND box_id IS NOT NULL GROUP BY owner_id, box_id
) T1
ORDER BY
id