SQL - mantain sort order for paginating data when change in real time - sql

I'm implementing a php page that display data in pagination format. My problem is that these data change in real time, so when user request next page, I submit last id and query is executed for retrieve more 10 rows from last id order by a column that value change in real time. For example I have 20 rows:
Id 1 col_real_time 5
Id 2 col_real_time 3
Id 3 col_real_time 11
Etc
I get data sorted by col_real_time in ascending order, so result is
id 2, id 1, id 3
Now in realtime id 2 change in col_real_time 29 before user send request for next page, user now send request for next results and because id 2 now is 29 he already see it.
How can I do?
Now in realtime id 2 change

You basically have to take a snapshot of the data if you don't want the data to appear to change to the user. This isn't something that you can do very efficiently in SQL, so I'd recommend downloading the entire result set into a PHP session variable that can be persisted across pages. That way, you can just get rows on demand. There are Javascript widgets that will effectively do the same thing, but will send the entire result set to the client which is a bad idea if you have a lot of data.
This is not as easy to do as pure SQL pagination, as you will have to take responsibility for cleaning the stored var out when it's no longer needed. Otherwise, you'll rather quickly run out of memory.

If you have just a few pages, you could:
Save it to session and page over it, instead of going back to the
database server.
Save it to a JSON object list and use Jquery to read it and page
over it.
Save it to a temp table indicating generation timestamp, user_id and
session_id, and page over it.

Related

Best practice for pagination based on item updated time

Let's consider I have 30 items in my db. And clientA will make an api call to get the first 10 records based on item updated time. And think of a use case where clientB updated the 11th record (item) by making some changes in it. But now when clientA makes an api call for next set of items based on the pagination page 2 (items from 11 to 20) It's because the clientB has updated the 11th item the pagination is going to break here (Bases on updated time 11th item will become 1 and 1 become 2, 2 become 3 ...10 becomes 11).There is a chance that clientA is will receive the duplicate data.
Is there any better approach for this kind of problem ??
Any help would be thankfull
I think you could retrieve all elements each time using no pagination at all, to prevent this kind of "false information" at your table.
If visualizing the actual values of each record is mandatory, you could always add a new function to your api working as a trigger. Each time a user modifies any record, this api's function will trigger a message for all active sessions to notify the user some data has been changed. As an example, think about something like the "twitter's live feed". In which when a new bunch of tweets are created, Twitter will notify all users to reload the page if they want to see realtime information.

Custom Pagination in datatable

I have a web application in which I get data from my database and show in a datatable. I am facing an issue doing this as the data that I am fetching has too many rows(200 000). So when I query something like select * from table_name;
my application gets stuck.
Is there a way to handle this problem with JavaScript?
I tried pagination but I cannot figure how would i do that as datatable creates pagination for already rendered data?
Is there a way through which I can run my query through pagination at
the backend?
I have come across the same problem when working with mongodb and angularjs. I used server side paging. Since you have huge number of records, You can try using the same approach.
Assuming a case that you are displaying 25 records in one page.
Backend:
Get the total count of the records using COUNT query.
select * from table_name LIMIT 25 OFFSET
${req.query.pageNumber*25} to query limited records based on the page number;
Frontend:
Instead of using datatable, display the data in HTML table it self.
Define buttons for next page and previous page.
Define global variable in the controller/js file for pageNumber.
Increment pageNumber by 1 when next page button is clicked and
decrement that by 1 when prev button is pressed.
use result from COUNT query to put upper limit to pageNumber
variable.(if 200 records are there limit will be 200/25=8).
So basically select * from table_name LIMIT 25 OFFSET
${req.query.pageNumber*25} will limit the number of records to 25. when req.query.pageNumber=1, it will offset first 25records and sends next 25 records. similarly if req.query.pageNumber=2, it will offset first 2*25 records and sends 51-75 records.
There are two ways to handle.
First way - Handling paging in client side
Get all data from database and apply custom paging.
Second way - Handling paging in server side
Every time you want to call in database and get records according to pagesize.
You can use LIMIT and OFFSET constraints for pagination in MySQL. I understand that at a time 2 lacs data makes performance slower. But as you mention that you have to use JS for that. So make it clear that if you wants js as frontend then it is not going to help you. But as you mention that you have a web application, If that application is on Node(as server) then I can suggest you the way, which can help you a lot.
use 2 variables, named var_pageNo and var_limit. Now use the row query of mysql as
select * form <tbl_name> LIMIT var_limit OFFSET (var_pageNo * var_limit);
Do code according to this query. Replace the variable with your desire values. This will make your performance faster, and will fetch the data as per your specified limit.
hope this will helpful.

How can I query data from HBase table in millisecond?

I'm writing a interface to query pagination data from Hbase table ,I query pagination data by some conditions, but it's very slow .My rowkey like this : 12345678:yyyy-mm-dd , length of 8 random Numbers and date .I try to use Redis cache all rowkeys and do pagination in it , but it's difficult to query data by the other conditions .
I also consider to design the secondary index in Hbase , and I discuss it with colleagues ,they think the secondary index is hard to maintain .
So , who can give me some ideas?
First thing, AFAIK random number + date pattern of rowkey may lead to hotspotting, if you scale with large data.
Regarding Pagination :
I'd offer solr + hbase if you are using cloudera then its cloudera search. It gives good performance(proved in our case) while querying 100 per page and with webservice call we have populated angularjs dashboard.
Also, most important thing is you can move back and forth between pages with out any issues..
Below diagram describes that.
To achieve this, you need to create collections(from hbase data) and can use solrj api
Hbase alone with scan api doesn't work for quick queries.
Apart from that, Please see my answer. Which is more insightful with implementation details...
How to achieve pagination in HBase?
Hbase only solution could be Hindex (co-processor based solution)
Link explains more in detail
Hindex architecture :
In Hbase to achieve good read performance you want your data retrieved by small number of gets (requests for single row) or a small scan (request over range of rows). Hbase stores your data sorted by key, so most important idea is to come up with such row key that would allow it.
Your key seems to contain only random integer and date so I assume that your queries are about pagination over records marked with time.
First idea is that in typical pagination scenario you access just 1 page at a time and navigate from page 1 to page 2 to page 3 etc. Given you want to paginate over all records for date 2015-08-16 you could use a scan of 50 rows with start key '\0:2015-08-16' (as it is smaller than any row in 2015-08-16) to retrieve first page. After retrieval of first page you have last key from a first page, say '12345:2015-08-16'. You can use it (or 12346:2015-08-16) to make another scan with start key 12346:2015-08-16 of 50 rows to retrieve page 2 and so on. So using this approach you query your pages fast as a single scan with predefined number of returned rows. So you can use last page row key as a parameter to paging API or just put last row key in redis so next paging API call will find it there.
All this works perfectly well until some user comes in and clicks directly to page 100. Or try to click on page 5 when he was on page 2. In such scenario you can use similar scan with nSkippedPages * 50 rows. This will not be as fast as a sequential access, but it's not a usual page usage pattern. You can use redis then to cache last row of the page result in a structure like pageNumber -> rowKey. Then if next user comes and clicks on page 100, it will see same performance as is in usual click page 1- click page 2- click page 3 scenario.
Then to make things more fast for users which click on page 99 first time, you could write a separate daemon which retrieves every 50th row and puts result in redis as a page index. Then launch it every 10-15 minutes and say that your page index has at most 10-15 minutes stale data.
You also can design a separate API which preloads row keys for a bulk of N pages (say about 100 pages, it could be async e.g. don't wait for actual preload to complete). What it will do is just a scan with KeyOnlyFilter and 50*N results and then selection of rowkeys for each page. So it accepts rowkey and populates redis with rowkey cache for N pages. Then when user walks in on a first page you fetch first 100 pages row keys for him so when he clicks on some page link seen on page, page start row key will be available. With right bulk size of preload you could approach your required latency.
Limit could be implemented using Scan.setMaxResults() or using PageFilter.
"skip nPages * 50 rows" and especially "output every 50th row" functionality seems to be trickier e.g. for latter you may end-up performing full scan which retrieves the keys or writing map-reduce to do it and for first it is not clear how to do it without sending rows over network since request can be distributed across several regions.
If you are looking for secondary indexes that are maintained in HBase there are several open source options (Splice Machine, Lilly, etc.). You can do index lookups in a few milliseconds.

How to fetch data for a news feed like system?

I have few tables as shown below
Polls
PollId Question Option
1 What 1
2 Why 4
Updates
UpdateId Text
1 Sleep
2 Play
Polls and updates are just two sample tables (In reality there are more tables like ,photos, videos,links etc). But when a user visit his home (like facebook new feed) he must be displayed with data relevant to him (no such data included in this example). ie I want to select data from all tables with less number of query executions. (ie, I want to present a mixture of datas, ie polls, photos, videos etc )
Currently, I'm fetching only ids and type (ie which table) from all of the tables and gather further data while iterating through this resultset. (ie from c# calling another SqlQuery) .
Is there a way to query the data from whole tables at once? (OUTER JOIN?, UNION?)
Or simply,
How can I select different type of entities at once in a single sql Query?
You could write your query so that you have one long select list for everything you want and it all comes back in one result set but I suspect that wouldn't work too well because you might have varying numbers of different types of items per user.
If you really must have it all in one hit then you can issue multiple queries in one go and get multiple result sets back. To handle this you can use an ADO.Net DataSet. See this SO example (but not the accepted answer - see Vikram Dibyal's answer as that gives a very basic overview of what I think you're asking for).
I won't copy and paste the stuff from the linked thread, just head over and take a look.

Dealing with gaps in timeline

I'm looking for some assistance to sort out the logic for how I am going to deal with gaps in a feed timeline, pretty much like what you would see in various Twitter clients. I am not creating a Twitter client, however, so it won't be specific to that API. I'm using our own API, so I can possibly make some changes to the API as well to accomodate for this.
I'm saving each feed item in Core Data. For persistance, I'd like to keep the feed items around. Let's say I fetch 50 feed items from my server. The next time the user launches the app, I do a request for the latest feed items and I am returned with 50 feed items and do a fetch to display the feed items in a table view.
Enough time may have passed between the two server requests that a time gap exists between the two sets of feed items.
50 new feed items (request 2)
----- gap ------
50 older feed items (request 1)
* end of items in core data - just load more *
I keep track of whether a gap exists by comparing the oldest timestamp for the feed items in request 2 with the newest timestamp in set of feed items from request 1. If the oldest timestamp from request 2 is greater then the newest timestamp from request 1 I can assume that a gap exists and I should display a cell with a button to load 50 more. If the oldest timestamp from request 2 is less than or equal to the newest timestamp from request 1 the gap has been filled and there's no need to display the loader.
My first issue is the entire logic surrounding keeping track of whether or not to display the "Load more" cell. How would I know where to display this gap? Do I store it as the same NSManagedObject entity as my feed items with an extra bool + a timestamp that lies in between the two above and then change the UI accordingly? Would there be another, better solution here?
My second issue is related to multiple gaps:
50 new feed items
----- gap ------
174 older feed items
----- gap ------
53 older feed items
* end of items in core data - just load more *
I suppose it would help in this case to go with an NSManagedObject entity so I can just do regular fetches in my Core Data and if they show up amongst the objects, then display them as loading cells and remove them accordingly (if gaps no longer exist between any sets of gaps).
I'd ultimately want to wipe the objects after a certain time has passed as the user probably wouldn't go back in time that long and if they do I can always fetch them from my server if needed.
Any experiences and advice anybody has with this subject is be greatly appreciated!