Acquiring offset of some row in SQL - sql

TL;DR: Is there a possibility to get OFFSET position of a particular, known row in SQL, considering some ORDER BY is applied?
So consider a schema like this (simplified):
CREATE TABLE "public"."painting" (
"uuid" uuid NOT NULL DEFAULT uuid_generate_v4(),
"name" varchar NOT NULL,
"score" int4 NOT NULL,
"approvedAt" timestamp,
PRIMARY KEY ("uuid")
);
Like
abc1,test1,10,10:00
abc2,test2,9,11:00
abc3,test3,8,8:00
abc4,test4,8,12:00
abc5,test5,6,7:00
I want to make a request sorted by score and limited with 3 items, and I should emphasize that multiple entities might have the same score.
Because of a dynamic nature of that table, while traversing through those items, sorted by score, some new item might appear somewhere in the list.
If I use SQL OFFSET statement, that means this new entity will shift all entities below to one row, so that the new selection will have an item, that was last on previous 3 items selection.
abc1,test1,10,10:00
abc2,test2,9,11:00
abc6,test6,8,15:00 (new item)
CURRENT OFFSET = 3
abc3,test3,8,8:00 (was in previous select)
abc4,test4,8,12:00
abc5,test5,6,7:00
To avoid that, instead of using OFFSET, I can remember the UUID of the item I fetched last, so it'll be abc3. On next request, I can use it's score to add an extra WHERE SCORE < 8 statement, but this will skip abc4, because it's too having score of 8.
If I use WHERE SCORE <= 8 this will again return abc3 which is already traversed. I can't use another field in WHERE clause, because this will affect the results. Additional ORDER BY won't help either.
It seems to me that it is a very common problem in database selection, yet I can't find one comprehensive answer.
So, my question then, if it's possible to do some kind of request like following:
SELECT * FROM "painting" WHERE "score" <= :score ORDER BY "score" DESC OFFSET %position of `abc3`% LIMIT 3
Or alternatively
SELECT OFFSET OF (`abc3`) FROM "painting" WHERE SCORE <= :score ORDER BY "score" DESC LIMIT 3
That will return 2 (because it's the second row with such score), then do
SELECT * FROM "painting" WHERE "score" <= :score ORDER BY "score" DESC OFFSET :offset LIMIT 3
where :score is the score of last received item and :offset is the result of SELECT OFFSET - 1
My own assumption is that we have to SELECT WHERE "score" = :score, and get offset position outside the SQL (or make a very complex SQL query). Though, if we have a lot of items with similar ORDER BY attribute, this helper request might end up being heavier than the data fetch itself.
Yet, I feel like that there's a much more clever SQL way of doing what I'm trying to do.

Good question. Accurate Backend Pagination requires the underlying data to use an ordering criteria with a set of columns that represent a UNIQUE key.
In your case your ordering criteria can be made unique by adding the column uuid to it. With that in mind you can increase the page size by 1 behind the scenes to 4. That 4th row won't be displayed but only used to retrieve the next page.
For example, you can get:
select *
from painting
order by -score, approvedAt, uuid
limit 4
Now you would display the first three rows:
abc1,test1,10,10:00
abc2,test2,9,11:00
abc3,test3,8,8:00
The client app (most likely the UI) will remember -- not display -- the 4th row (the "key") to retrieve the next page:
abc4,test4,8,12:00
Then, to get the next page the query will add a WHERE clause with the "key" and take the form:
select *
from painting
where (-score, approvedAt, uuid) >= (-8, '12:00', 'abc4')
order by -score, approvedAt, uuid
limit 4
This query won't display the new row being inserted, but the original 4th row.
To get blazing fast data retrieval you could create the index:
create index ix1 on painting ((-score), approvedAt, uuid);
See example at DB Fiddle.

Related

Returning Only Latest Result For Location in SQL [duplicate]

This question already has answers here:
How do I limit the number of rows returned by an Oracle query after ordering?
(14 answers)
Closed last year.
I'm wanting to find the very last pallet placed in a given batch of locations within a warehouse.
I currently have:
SELECT
max(datreg) AS "_Reg Date",
logguser,
mha,
rack,
horcoor,
vercoor
FROM
L16T3
WHERE
l16lcode = '3'
AND
rack = #('Rack?',rack)
AND
horcoor >= #('Loc From?',horcoor)
AND
horcoor <= #('Loc To?',horcoor)
ORDER BY 1
LIMIT 1
I thought this would return just the last pallet placed in that specific location, but I'm still getting like 4 entries for one location.
I would only want the highlighted result, as that is the most recent pallet placed in 110-001-04:
I'm sure this is super simple but im just starting out :)
You can use a combination of ORDER BY and LIMIT to achieve what you want.
Limit
In a lot of other databases, this is called LIMIT, but I missed that you are using an Oracle database, which has a different dialect of SQL. In Oracle, the most direct equivilent of a limit is:
FETCH FIRST n ROWS ONLY
This means that your query can return at most n rows. So, for example, FETCH FIRST 1 ROWS ONLY means that it can return at most 1 row. The issue is that it takes rows from the start of the table, not the end (and despite the wording implying FETCH LAST n ROWS ONLY would be a thing, it doesn't seem to be) --- you can essentially think of it as cutting off the rows below given limit.
For example, if I have rows in order "A", "B", and "C", FETCH FIRST 1 ROWS ONLY only returns "A". If "C" was really the one I wanted (e.g. the row at the bottom), then I would need to add an ORDER BY clause to first order the results so that the one I want is at the top.
Order By
ORDER BY column dir orders your results by a specific column in a specific direction, e.g. in ascending (ASC) or descending (DESC) order. The syntax actually allows for more complex ordering (e.g. ordering by multiple columns or by a function), but for this simple case this should do what we need.
Putting it together
You want to order so that your desired row is at the top of your table, then you want to limit your results set to contain at most one row.
Adding something like this to the end of your query should work:
ORDER BY "_Reg Date" DESC
FETCH FIRST 1 ROWS ONLY

Dealing with value based pagination given a number of a page

When using value based pagination
select * from articles
where id > #start_value
limit #page_size
how can I calculate #start_value given only page number?
Namely: say, I had a website and html page with a list of articles that I needed to paginate. But even to render the very 1st page, I'd need to calculate #start_value somehow. The input from a user would be a number of a page which he clicked; for the very first page it'd be 1 - by default.
given that 1, how would I calcualate #start_value?
or given any random page, still how would I calcualate #start_value?
Note that the values of the column id of a table aren't necessarily sequential, even if id is autoincremented.
First off, pagination without any sorting is not ideal. You can't guarantee how SQL will sort your results without including and order by clause.
You will also need to know the page size to calculate your start value, but given #page_num, and #page_size: #start_value is calculated by #start_value = #page_num * #page_size;.
Here it is without the where clause and with limit/offset instead
select *
from articles
order by id
limit #page_size
offset (#page_size * #page_num)
You don't need the "where id > ..." part. The right way of achieving this is using limit #page_size offset #offset construct. This way you don't have to worry about the gaps. To calculate the offset based on page number, you just have to multiply page_size * page_number. Another important thing is that you should order your registers if you want to have the same result always. If you don't trust the IDs, you can order by date or another field. So:
select * from articles
order by date
limit #page_size
offset (#page_size * (#page_num-1))
Note: I used (#page_num-1) to start with a 0 offset for page 1.

Wrapping a range of data

How would I select a rolling/wrapping* set of rows from a table?
I am trying to select a number of records (per type, 2 or 3) for each day, wrapping when I 'run out'.
Eg.
2018-03-15: YyBiz, ZzCo, AaPlace
2018-03-16: BbLocation, CcStreet, DdInc
These are rendered within a SSRS report for Dynamics CRM, so I can do light post-query operations.
Currently I get to:
2018-03-15: YyBiz, ZzCo
2018-03-16: AaPlace, BbLocation, CcStreet
First, getting a number for each record with:
SELECT name, ROW_NUMBER() OVER (PARTITION BY type ORDER BY name) as RN
FROM table
Within SSRS, I then adjust RN to reflect the number of each type I need:
OnPageNum = FLOOR((RN+num_of_type-1)/num_of_type)-1
--Shift RN to be 0-indexed.
Resulting in AaPlace, BbLocation and CcStreet having a PageNum of 0, DdInc of 1, ... YyBiz and ZzCo of 8.
Then using an SSRS Table/Matrix linked to the dataset, I set the row filter to something like:
RowFilter = MOD(DateNum, NumPages(type)) == OnPageNum
Where DateNum is essentially days since epoch, and each page has a separate table and day passed in.
At this point, it is showing only N records of type per page, but if the total number of records of a type isn't a multiple of the number of records per page of that type, there will pages with less records than required.
Is there an easier way to approach this/what's the next step?
*Wrapping such as Wraparound found in videogames, seamless resetting to 0.
To achieve this effect, I found that offsetting the RowNumber by -DateNum*num_of_type (negative for positive ordering), then modulo COUNT(type) would provide the correct "wrap around" effect.
In order to achieve the desired pagination, it then just had to be divided by num_of_type and floor'd, as below:
RowFilter: FLOOR(((RN-DateNum*num_of_type) % count(type))/num_of_type) == 0

Solution for allowing user sorting in SQlite

By user sorting I mean that as a user on the site you see a bunch of items, and you are supposed to be able to reorder them (I'm using jQuery UI).
The user only sees 20 items on each page, but the total number of items can be thousands.
I assume I need to add another column in the table for custom ordering.
If the user sees items from 41-60, and and he sorts them like:
41 = 2nd
42 = 1st
43 = fifth
etc.
I can't just set the ordering column to 2,1,5.
I would need to go through the entire table and change each record.
Is there any way to avoid this and somehow sort only the current selection?
Add another column to store the custom order, just as you suggested yourself. You can avoid the problem of having to reassign all rows' values by using a REAL-typed column: For new rows, you still use an increasing integer sequence for the column's value. But if a user reorders a row, the decimal data type will allow you to use the formula ½ (previous row's value + next row's value) to update the column of the single row that was moved. You
have got two special cases to take care of, namely if a user moves a row to the very beginning or end of the list. In that case, just use min - 1 rsp. max + 1.
This approach is the simplest I can think of, but it also has some downsides. First, it has a theoretical limitation due to the datatype having only double-precision. After a finite number of reorderings, the values are too close together for their average to be a different number. But that's really only a theoretical limit you should never reach in practical applications. Also, the column will use 8 bytes of memory per row, which probably is much more than you actually need.
If your application might scale to the point where those 8 bytes matter or where you might have users that overeagerly reorder rows, you should instead stick to the INTEGER column and use multiples of a constant number as the default values (e.g. 100, 200, 300, ..). You still use the update formula from above, but whenever two values become too close together, you reassign all values. By tweaking the constant multiplier to the average table size / user behaviour, you can control how often this expensive operation has to be done.
There are a couple ways I can think of to do this. One would be to use a SELECT FROM SELECT style statement. As in something like this.
SELECT *
FROM (
SELECT col1, col2, col3...
FROM ...
WHERE ...
LIMIT n,m
) as Table_A
ORDER BY ...
The second option would be to use temp tables such as:
INSERT INTO temp_table_A SELECT ... FROM ... WHERE ... LIMIT n,m;
SELECT * FROM temp_table_A ORDER BY ...
Another option to look at would be jQuery plugin like DataTables
one way i can think of is:
Add a new column (if feasible) or create a new table for holding the order of the items.
On any page you will show around 20 items based on the initial ordering.
Using the jquery's Draggable you can send updates to this table
I think you can do this with an extra column.
First, you could prepopulate this new column with a default sort order and then allow the user to interactively modify it with the drag and drop of jquery-ui.
Lets say this user has 100 items in the table. You set the values in the order column to [1,2,3,...,99,100]. I suggest that you run a script on the original table to set all items to a default sort order.
Now going back to your example where the user is presented with items 41-60: the initial presentation in their browser would rank those at orders [41,42,43,...,59,60]. You might also need to save the lowest order that appears in this subset, in this case 41. Or better yet, save the entire array of rankings and restore the exact same numbers in the new order. This covers the case where they select a set of records that are not already consecutively ordered, perhaps because they belong to someone else.
To demonstrate what I mean: when they reorder them in the page, your javascript reassigns those same numbers back to the subset in the new order. Like this:
item A : 41
item B : 45
item C : 46
item D : 47
item E : 51
item F : 54
item G : 57
then the user changes them to this order, but you reassign the numbers like this:
item D : 41
item F : 45
item E : 46
item A : 47
item C : 51
item B : 54
item G : 57
This should also work if the subset is consecutive.

table design + SQL question

I have a table foodbar, created with the following DDL. (I am using mySQL 5.1.x)
CREATE TABLE foodbar (
id INT NOT NULL AUTO_INCREMENT,
user_id INT NOT NULL,
weight double not null,
created_at date not null
);
I have four questions:
How may I write a query that returns
a result set that gives me the
following information: user_id,
weight_gain where weight_gain is
the difference between a weight and
a weight that was recorded 7 days
ago.
How may I write a query that will
return the top N users with the
biggest weight gain (again say over
a week).? An 'obvious' way may be to
use the query obtained in question 1
above as a subquery, but somehow
picking the top N.
Since in question 2 (and indeed
question 1), I am searching the
records in the table using a
calculated field, indexing would be
preferable to optimise the query -
however since it is a calculated
field, it is not clear which field
to index (I'm guessing the 'weight'
field is the one that needs
indexing). Am I right in that
assumption?.
Assuming I had another field in the
foodbar table (say 'height') and I
wanted to select records from the
table based on (say) the product
(i.e. multiplication) of 'height'
and 'weight' - would I be right in
assuming again that I need to index
'height' and 'weight'?. Do I also
need to create a composite key (say
(height,weight)). If this question
is not clear, I would be happy to
clarify
I don't see why you should need the synthetic key, so I'll use this table instead:
CREATE TABLE foodbar (
user_id INT NOT NULL
, created_at date not null
, weight double not null
, PRIMARY KEY (user_id, created_at)
);
How may I write a query that returns a result set that gives me the following information: user_id, weight_gain where weight_gain is the difference between a weight and a weight that was recorded 7 days ago.
SELECT curr.user_id, curr.weight - prev.weight
FROM foodbar curr, foodbar prev
WHERE curr.user_id = prev.user_id
AND curr.created_at = CURRENT_DATE
AND prev.created_at = CURRENT_DATE - INTERVAL '7 days'
;
the date arithmetic syntax is probably wrong but you get the idea
How may I write a query that will return the top N users with the biggest weight gain (again say over a week).? An 'obvious' way may be to use the query obtained in question 1 above as a subquery, but somehow picking the top N.
see above, add ORDER BY curr.weight - prev.weight DESC and LIMIT N
for the last two questions: don't speculate, examine execution plans. (postgresql has EXPLAIN ANALYZE, dunno about mysql) you'll probably find you need to index columns that participate in WHERE and JOIN, not the ones that form the result set.
I think that "just somebody" covered most of what you're asking, but I'll just add that indexing columns that take part in a calculation is unlikely to help you at all unless it happens to be a covering index.
For example, it doesn't help to order the following rows by X, Y if I want to get them in the order of their product X * Y:
X Y
1 8
2 2
4 4
The products would order them as:
X Y Product
2 2 4
1 8 8
4 4 16
If mySQL supports calculated columns in a table and allows indexing on those columns then that might help.
I agree with just somebody regarding the primary key, but for what you're asking regarding the weight calculation, you'd be better off storing the delta rather than the weight:
CREATE TABLE foodbar (
user_id INT NOT NULL,
created_at date not null,
weight_delta double not null,
PRIMARY KEY (user_id, created_at)
);
It means you'd store the users initial weight in say, the user table, and when you write records to the foodbar table, a user could supply the weight at that time, but the query would subtract the initial weight from the current weight. So you'd see values like:
user_id weight_delta
------------------------
1 2
1 5
1 -3
Looking at that, you know that user 1 gained 4 pounds/kilos/stones/etc.
This way you could use SUM, because it's possible for someone to have weighings every day - using just somebody's equation of curr.weight - prev.weight wouldn't work, regardless of time span.
Getting the top x is easy in MySQL - use the LIMIT clause, but mind that you provide an ORDER BY to make sure the limit is applied correctly.
It's not obvious, but there's some important information missing in the problem you're trying to solve. It becomes more noticeable when you think about realistic data going into this table. The problem is that you're unlikely to to have a consistent regular daily record of users' weights. So you need to clarify a couple of rules around determining 'current-weight' and 'weight x days ago'. I'm going to assume the following simplistic rules:
The most recent weight reading is the 'current-weight'. (Even though that could be months ago.)
The most recent weight reading more than x days ago will be the weight assumed at x days ago. (Even though for example a reading from 6 days ago would be more reliable than a reading from 21 days ago when determining weight 7 days ago.)
Now to answer the questions:
1&2: Using the above extra rules provides an opportunity to produce two result sets: current weights, and previous weights:
Current weights:
select rd.*,
w.Weight
from (
select User_id,
max(Created_at) AS Read_date
from Foodbar
group by User_id
) rd
inner join Foodbar w on
w.User_id = rd.User_id
and w.Created_at = rd.Read_date
Similarly for the x days ago reading:
select rd.*,
w.Weight
from (
select User_id,
max(Created_at) AS Read_date
from Foodbar
where Created_at < DATEADD(dd, -7, GETDATE()) /*Or appropriate MySql equivalent*/
group by User_id
) rd
inner join Foodbar w on
w.User_id = rd.User_id
and w.Created_at = rd.Read_date
Now simply join these results as subqueries
select cur.User_id,
cur.Weight as Cur_weight,
prev.Weight as Prev_weight
cur.Weight - prev.Weight as Weight_change
from (
/*Insert query #1 here*/
) cur
inner join (
/*Insert query #2 here*/
) prev on
prev.User_id = cur.User_id
If I remember correctly the MySql syntax to get the top N weight gains would be to simply add:
ORDER BY cur.Weight - prev.Weight DESC limit N
2&3: Choosing indexes requires a little understanding of how the query optimiser will process the query:
The important thing when it comes to index selection is what columns you are filtering by or joining on. The optimiser will use the index if it is determined to be selective enough (note that sometimes your filters have to be extremely selective returning < 1% of data to be considered useful). There's always a trade of between slow disk seek times of navigating indexes and simply processing all the data in memory.
3: Although weights feature significantly in what you display, the only relevance is in terms of filtering (or selection) is in #2 to get the top N weight gains. This is a complex calculation based on a number of queries and a lot of processing that has gone before; so Weight will provide zero benefit as an index.
Another note is that even for #2 you have to calculate the weight change of all users in order to determine the which have gained the most. Therefore unless you have a very large number of readings per user you will read most of the table. (I.e. a table scan will be used to obtain the bulk of the data)
Where indexes can benefit:
You are trying to identify specific Foodbar rows based on User_id and Created_at.
You are also joining back to the Foodbar table again using User_id and Created_at.
This implies an index on User_id, Created__at would be useful (more-so if this is the clustered index).
4: No, unfortunately it is mathematically impossible to determine how the individual values H and W would independently determine the ordering of the product. E.g. both H=3 & W=3 are less than 5, yet if H=5 and W=1 then the product 3*3 is greater than 5*1.
You would have to actually store the calculation an index on that additional column. However, as indicated in my answer to #3 above, it is still unlikely to prove beneficial.