Should paging be zero indexed within an API? - api

When implementing a Rest API, with parameters for paging, should paging be zero indexed or start at 1. The parameters would be Page, and PageSize.
For me, it makes sense to start at 1, since we are talking about pages

There's no standard for it. Just have a look around: there are hundreds of thousands of APIs using different approaches.
Most of APIs I know use one of the following approaches for pagination:
offset and limit or
page and size
Both can be 0 or 1 indexed. Which is better? That's up to you.
Just pick the one that fits your needs and document it properly.
Additionally, you could provide some links in the response payload to make the navigation easier between the pages.
Consider, for example, you are reading data from page 2. So, provide a link for the previous page (page 1) and for the next page (page 3):
{
"data": [
...
],
"paging": {
"previous": "http://api.example.com/foo?page=1&size=10",
"next": "http://api.example.com/foo?page=3&size=10"
}
}
And remember, always make an API you would love to use.

True, there's no standard for this.
I find that Microsoft based products used (old ones like DAO for Visual Basic 6, Visual C++ 6 and similar products) to start their pagination from 1, but a lot of other tech stacks uses 0. Gradually I find that more and more libraries are using 0 instead of 1.
Why is this? It's because, mathematically speaking, it's easier to map pageIndex starting from 0 to rowNumber in DB or Array. Suppose you have a dataset fetched from a Table in DB with 100 records. Now you want to send the second page (pageSize = 10 for example). With pageIndex starting from 0, then you only need to write
startRowNumber = pageIndex * pageSize;
return dataSet[startRowNumber, startRowNumber + pageSize]
Because in most DBs and languages, arrays/lists are 0-indexed. And even if your Rest API language uses a 1-indexed array, you would still have a problem when mapping a 1-indexed pageIndex to recordIds. For example: Suppose you have a dataset indexed 1..100 (not 0..99), and you want to send the 11th to 20th records, as the second page (here pageSize=10 and pageIndex=2, because in your case you start with 1). This means you need use the formula
((pageIndex - 1) * pageSize) + 1 ; // to get the number 11.
You see that it's easier to have a 0-indexed paging for developers.
1-indexed pagination makes more sense to human users, because we start with 1 when counting everything.

Related

How to programmatically list available Google BigQuery locations?

How to programmatically list available Google BigQuery locations? I need a result similar to what is in the table of this page: https://cloud.google.com/bigquery/docs/locations.
As #shollyman has mentioned
The BigQuery API does not expose the equivalent of a list locations call at this time.
So, you should consider filing a feature request on the issue tracker.
Meantime, I wanted to add Option 3 to those two already proposed by #Tamir
This is a little naïve option with its pros and cons, but depends on your specific use case can be useful and easy adapted to your application
Step 1 - load page (https://cloud.google.com/bigquery/docs/locations) html
Step 2 - parse and extract needed info
Obviously, this is super simple to implement in any client of your choice
As I am huge BigQuery fan - I went through "prove of concept" using BigQuery Tool - Magnus
I've created workflow with just two Tasks:
API Task - to load page's HTML into variable var_payload
and
BigQuery Task - to parse and extract wanted info out of html
The "whole" workflow is as simple as it looks in below screenshot
The query I used in BigQuery Task is
CREATE TEMP FUNCTION decode(x STRING) RETURNS STRING
LANGUAGE js AS """
return he.decode(x);
"""
OPTIONS (library="gs://my_bucket/he.js");
WITH t AS (
SELECT html,
REGEXP_EXTRACT_ALL(
REGEXP_REPLACE(html,
r'\n|<strong>|</strong>|<code>|</code>', ''),
r'<table>(.*?)</table>'
)[OFFSET(0)] x
FROM (SELECT'''<var_payload>''' AS html)
)
SELECT pos,
line[SAFE_OFFSET(0)] Area,
line[SAFE_OFFSET(1)] Region_Name,
decode(line[SAFE_OFFSET(2)]) Region_Description
FROM (
SELECT
pos, REGEXP_EXTRACT_ALL(line, '<td>(.*?)</td>') line
FROM t,
UNNEST(REGEXP_EXTRACT_ALL(x, r'<tr>(.*?)</tr>')) line
WITH OFFSET pos
WHERE pos > 0
)
As you can see, i used he library. From its README:
he (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would ...
After workflow is executed and those two steps are done - result is in project.dataset.location_extraction and we can query this table to make sure we've got what is expected
Note: obviously parsing and extracting needed locations info is quite simplified and surely can be improved to be more flexible in terms of changing source page layout
Unfortunately, There is no API which provides BigQuery supported location list.
I see two options which might be good for you:
Option 1
You can manually manage a list and expose this list to your client via an API or any other means your application support (You will need to follow BigQuery product updates to follow on updates on this list)
Option 2
If your use case is to provide a list of the location you are using to store your own data you can call dataset.list to get a list of location and display/use it in your app
{
"kind": "bigquery#dataset",
"id": "id1",
"datasetReference": {
"datasetId": "datasetId",
"projectId": "projectId"
},
"location": "US"
}

How do you do pagination in GUN?

How do you do something like gun.get({startkey, endkey}) ?
Previously: https://github.com/amark/gun/issues/479
#qwe123wsx #sebastianmacias apologies for the delay! Originally posted at: https://github.com/amark/gun/issues/479
The wire spec has a protocol for this but it isn't implemented yet. It looks something like this:
gun.on('out', {get: {'#': {'>': 'a', '<': 'b'}}});
However this doesn't work yet. I would recommend instead:
(1) Pagination behavior is very different from one app to another and will be hard for us to create a "one-size-fits-all" solution, so it would be highly helpful if you could implement your own* pagination and make it available as a user-module, then we can learn from your experience (what worked, what didn't) and make the best solution part of core.
(2) Your app will probably work fine without pagination in the meanwhile, while it can be built (it is targeted for after 1.0), and then as your app becomes more popular, it should be fairly easy to add in without much refactor, once you need it and it is available.
... * How to build your own?
Lots of good articles on this, best one I've seen yet is from Neo4j on how to do it in a graph database (which applies to gun as well) https://graphaware.com/neo4j/2014/08/20/graphaware-neo4j-timetree.html .
Another rough idea is you model your data based on pagination or time. So rather than having ALL tweets go into user's tweet table, instead, the user's tweet table is a table of DAYS (or weeks), and then you put the tweet inside the week table. Now when you load the data, you can scan/skip based off of week very easily while it being super bandwidth efficient.
Rough PSEUDO code:
function onTweetSend(tweet){
gun.get('user').get('alice').get('tweets').get(Date.uniqueYear() + Date.uniqueWeek()).set(tweet)
}
function paginateUserTweet(howMany, cb){
var range = convertToArrayOfUniqueWeekNamesFromToday(howMany);
var all = [];
range.forEach(function(week){
gun.get('user').get('alice').get('tweets').get(week).load(function(tweets){
all.push(tweets);
if(all.length < range.length){ return }
all = flattenArray(all);
cb(all);
});
});
}
Now we can use https://gun.eco/docs/RAD#lex
gun.get(...).get({'.': {'>': startkey, '<': endkey}, '%': 50000}).map().once(...)

phalcon querybuilder total_items always returns 1

I make a query via createBuilder() and when executing it (getQuery()->execute()->toArray())
I got 10946 elements. I want to paginate it, so I pass it to:
$paginator = new \Phalcon\Paginator\Adapter\QueryBuilder(array(
"builder" => $builder,
"limit" => $limit,
"page" => $current_page
));
$limit is 25 and $current_page is 1, but when doing:
$paginator->getPaginate();
$page->total_items;
returns 1.
Is that a bug or am I missing something?
UPD: it seems like when counting items it uses created sql with limit. There is no difference what limit is, limit divided by items per page always equals 1. I might be mistaken.
UPD2: Colleague helped me to figure this out, the bug was in the query phalcon produces: count() of the group by counts grouped elements. So a workaround looks like:
$dataCount = $builder->getQuery()->execute()->count();
$page->next = $page->current + 1;
$page->before = $page->current - 1 > 0 ? $page->current - 1 : 1;
$page->total_items = $dataCount;
$page->total_pages = ceil($dataCount / 100);
$page->last = $page->total_pages;
I know this isn't much of an answer but this is most likely to be a bug. Great guys at Phalcon took on a massive job that is too big to do it properly in their little free time and things like PHQL, Volt and other big but non-core components do not receive as much attention as we'd like. Also given that most time in the past 6 months was spent on v2 there are nearly 500 bugs about stuff like that and it's counting. I came across considerable issues in ORM, Volt, Validation and Session, which in the end made me stick to other not as cool but more proven solutions. When v2 comes out I'm sure all attention will on the bug list and testing, until then we are mostly on our own. Given that it's all C right now, only a few enthusiast get involved, with v2 this will also change.
If this is the only problem you are hitting, the best approach is to update your query to get the information you need yourself without getPaginate().

Number of results of find-feature in sails.js restful api (in newer versions)

I started using the sails.js framework a few months ago because I need it's restful API.
In the first version a simple "http://domain.com:1337/mymodel" returned all datasets of the connected MySQL-database, however, after an update to V 0.10.xx it returns only the first 30 results.
I searched the sails.js changelog, documentation and various examples around the web and tried several ideas but I can't figure out how to force sails.js to return **all* results again.
Has anybody a solution for this?
Use sails.config.blueprints.defaultLimit for general record limits. This also serves as the default limit for populated associations. There's technically no way at the moment to specify "no limit" for blueprints, but you can set the limit to the max number value as long as you don't have more than 9 quadrillion records :)
config/blueprints.js
defaultLimit: Number.MAX_VALUE // Set to highest possible value
Use populate_limit in your route config options to set the populate limit on a per-route basis.
config/routes.js
"GET /user": {blueprint: populate_limit: 10}
Use populate_[alias]_limit in your route config options to set the populate limit for a particular association on a per-route basis (e.g. populate_pets_limit: 10)
config/routes.js
"GET /user": {blueprint: 'find', limit: 20, populate_limit: 10, populate_pets_limit: 5}
I'll make sure this all gets added to the docs!
defaultLimit: -1 brings back all rows
if you need cange only populate limit, you can use populate_limit in sails.config.blueprints
// defaultLimit: 30
populate_limit:999 //default value for populate limit

Paging Lucene's search results

I am using Lucene to show search results in a web application.I am also custom paging for showing the same.
Search results could vary from 5000 to 10000 or more.
Can someone please tell me the best strategy for paging and caching the search results?
I would recommend you don't cache the results, at least not at the application level. Running Lucene on a box with lots of memory that the operating system can use for its file cache will help though.
Just repeat the search with a different offset for each page. Caching introduces statefulness that, in the end, undermines performance. We have hundreds of concurrent users searching an index of over 40 million documents. Searches complete in much less than one second without using explicit caching.
Using the Hits object returned from search, you can access the documents for a page like this:
Hits hits = searcher.search(query);
int offset = page * recordsPerPage;
int count = Math.min(hits.length() - offset, recordsPerPage);
for (int i = 0; i < count; ++i) {
Document doc = hits.doc(offset + i);
...
}