MongoDB infinite scroll sorted results - ruby-on-rails-3

I am having a problem trying to achieve the following:
I'd like to have a page with 'infinite' scrolling functionality and all the results fetched to be sorted by certain attributes. The way the code currently works is, it places the query, sorts the results, and displays them. The problem is, that once the user reaches the bottom of the page and new query is placed, the results from this query are sorted, but in its own context. That is, if you have a total of 100 results, and the first query display only 50, then they are sorted. But the next query (for the next 50) sorts the results only based on these 50 results, not based on the 100 (total results).
So, do I have to fetch all the results at once, sort them, and then apply some pagination logic to them or there's a way for MongoDB to actually have infinite scrolling (AJAX requests) with sorting applying to the results?

There's a few ways to do this with MongoDB. You can use the .skip() and .limit() commands (documented here: http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-CursorMethods) to apply pagination to the query.
Alternatively, you could add a clause to your query like: {sorted_field : {$gt : <value from last record>}}. In other words, filter out matches of the query whose sorted value is less than that of the last resulting item from the current page of results. For example, if page 1 of results returns documents A through D, then to retrieve the next page 2 you repeat the same query with the additional filter x > D.

Let me preface this by saying that I have no experience with MongoDB (though I am aware that it is a NoSQL database).
This question, however, is somewhat of a general database one (you'd probably get more responses tagging it as such). I've implemented such a feature using Cassandra (another, albiet quite different NoSQL database), however the same principles apply.
Use the sorted-by attribute of the last retrieved record, and conduct a range search based on it in the database. So, assuming your database consists of the following set of letters:
A
B
C
D
E
F
G
..and you were retrieving 2 letters at a time, you'd retrieve A, B first. When more records are needed, you'd use B to conduct a range search on the set of letters in the database. In plain English this would be something like:
Get the letters that appear after B, limit the results to 2
From a brief look at the MongoDB tutorial, it looks like you have conditional operators to help you implement this.

Related

Pulling a 33,000-record recordset took LESS execution time than using Count() in the SQL. How is that possible?

Thanks in advance for putting up with me.
Pulling a 33,000-record recordset from the database took LESS execution time than using Count() in the SQL and just grabbing 20 rows.
How is that possible?
A bit more detail:
Before, we were grabbing the entire recordset yet only displaying 20 rows of it on a page at a time for pagination. That was cringeworthy and wasteful, so I redesigned the page to only grab 20 rows at a time and to simply use an index variable to grab the next page, and so on.
All well and good, but that lacked a record count, which our people needed.
So after the record query, I added (what I thought would be) a quick query just on the index of the table using the Count(index) function in Structured Query Language.
A side by side comparison of the original page and my new page indicates my new page takes roughly 10% longer to execute than the original! I was flabbergasted. I thought for sure it would be lightning fast, way faster than the original.
Any thoughts on why and what I might do to remedy that?
Is it because the script has to run two queries, regardless of the data retrieved?
Update:
Here is the SQL.
(Table names and field names are fictionalized in this post for security, but the structure is the same as the real page).
The main recordset select query contains:
SELECT
top 21 roster_id, roster_pplid, roster_pplemailid, roster_emailid, roster_firstname,
roster_lastname, roster_since, roster_pplsubscrid, roster_firstppldone, roster_pmtcurrent,
roster_emailverified, roster_active, roster_selfcanceled, roster_deactreason
FROM roster
WHERE
roster_siteid = 22
AND roster_isdeleted = false
order by roster_id desc
The record count query contains:
SELECT
COUNT(roster_id)
FROM
roster
WHERE
roster_siteid = 22
AND roster_isdeleted = false
The first query runs, then the second. The second always dynamically has the same matching WHERE filter.
I think I know why it is slower, I'm using GetRows to grab the recordset in the new page, was not using that in the old page. That seems to be the slowdown. But I have to use it, cannot step beyond the 21st record otherwise.
Nick.McDermaid : The SQL shown is selecting the TOP 21 rows, that is how it is grabbing just 20 rows (number 21 is just to populate the index for the "Next" page link).

How to fetch data for a news feed like system?

I have few tables as shown below
Polls
PollId Question Option
1 What 1
2 Why 4
Updates
UpdateId Text
1 Sleep
2 Play
Polls and updates are just two sample tables (In reality there are more tables like ,photos, videos,links etc). But when a user visit his home (like facebook new feed) he must be displayed with data relevant to him (no such data included in this example). ie I want to select data from all tables with less number of query executions. (ie, I want to present a mixture of datas, ie polls, photos, videos etc )
Currently, I'm fetching only ids and type (ie which table) from all of the tables and gather further data while iterating through this resultset. (ie from c# calling another SqlQuery) .
Is there a way to query the data from whole tables at once? (OUTER JOIN?, UNION?)
Or simply,
How can I select different type of entities at once in a single sql Query?
You could write your query so that you have one long select list for everything you want and it all comes back in one result set but I suspect that wouldn't work too well because you might have varying numbers of different types of items per user.
If you really must have it all in one hit then you can issue multiple queries in one go and get multiple result sets back. To handle this you can use an ADO.Net DataSet. See this SO example (but not the accepted answer - see Vikram Dibyal's answer as that gives a very basic overview of what I think you're asking for).
I won't copy and paste the stuff from the linked thread, just head over and take a look.

SQL select certain number of rows

Hello I need a SQL query statement that gets me rows 'start' to 'finish'.
For example:
A website with many items where page 1 selects only items 1-10, page 2 has 11-20 and so on.
I know how to do this with Microsoft SQL Server and MySQL but I need an implementation that is platform independent. :/
I have an Increment line for IDs but deleting in-between will mess the result when I select via
WHERE ID > number AND ID < othernumber
of course
Is this possible without fetching the whole database to a ResultSet?
I think your safest bet would be to use the BETWEEN operator. I believe it works across Oracle/MySQL/MSSQL.
WHERE ID BETWEEN number AND othernumber
Concerning your comment " I was just think for the case when first 100 IDs are gone I'll have to check further until there is something to fetch", you might wanna consider NOT actually ever deleting stuff from your database but to add a flag like "active" or something like that to your tables so you can avoid situations like the one you're now trying to avoid. The alternative is where you are now, having to find the max and min rows in a filter

How can I summarize and reuse a complex dataset

How can I re-use a single complex dataset across a number of tables?
The dataset has a number of computed columns that needs to be reported both in detail and in summary. Here's a very simplified example dataset:
is_food sale_association food_type total_sold total_associations percent_total
1 Before Movie Popcorn 50 3 x BirtMath.safeDivide(...)
0 Before Movie Soda 10 2 x BirtMath.safeDivide(...)
1 During Movie Jujubee 10 1 x BirtMath.safeDivide(...)
0 After Movie Soda 15 2 x BirtMath.safeDivide(...)
From this one dataset, I'd want to create a detailed summary of all food types while rolling up non food (using the 'is_food' column), another summary of all food types, another detailed summary of food with rolled up non-food by sale_association, etc. etc.
The report would also contain a number of percentages (6 in the most complex table) that need to be calculated (some across a row, others across all rows in a given group), all of which can have a zero value for the denominator and so need to be guarded against with safeDivide (which is a PITA to do in the source SQL query which itself is doing aggregation -- checking for divide by zero when both the numerator and denominator are sums leads to hairy queries).
Obviously I can do this by focusing the() SQL query as appropriate, but it seems like a waste of time and effort to create 12 or 15 queries that are very similar when I've already managed to create the monster query for the most detailed table.
What doesn't seem straightforward is how to perform the rollups in a table. I managed to hack something together by hiding rows that would later be summed up (e.g. "is_food == 0" in the example) and then creating custom data bindings that are displayed in a footer row. Not only does it feel like a hack, it also interferes with the ability to naturally order rows. Again, going back to the example, if I was ordering by total_sold and summarizing rows with is_food == 0, the natural order should be Popcorn, Non-food, Jujubee.
There's nothing in the BIRT wiki about this, nor does "BIRT: A Field Guide, 3rd E." really delve into the topic.
This seems like a fairly open-ended question (although I agree that re-using a single dataset makes much more sense than having multiple queries retrieving the same data in slightly different ways). A few general suggestions:
Use the most detailed version of the data required as a common dataset for each BIRT report item (typically BIRT tables)
Where summary-only level reporting is required, add groups to the BIRT table at the desired level, add data items as required to the group headers/footers and delete the detail level row(s) from the BIRT table.
Where detail-level reporting is required in some cases (eg. for food items but not for non-food items), add groups to the BIRT table as above, and set the visibility of the detail row (in Property Editor - Properties - Visibility) to check Hide Element, then specify the appropriate expression to suppress the non-required rows (non-food items, in this example).
Aggregations (ie. summary expressions) can be added to tables by selecting the whole table, selecting the Binding tab within the Property Editor and clicking the Add Aggregation... button.

Solr: Search in multiple fields BUT STOP if documents match was found

I want to search in multiple fields in Solr.
(In know the concept of the copy-fields and I know the (e)dismax search handler.)
So I have an orderd list of fields, I want the terms to be searched against.
1.) SKU
2.) Name
3.) Description
4.) Summary
and so on.
Now, when the query matches a term, let's say in the SKU field, I want this match and no further searches in the proceeding fields.
Only, if there are NO matches at all in the first field (SKU field), the second field (in this case "name") should be used and so on.
Is this possible with Solr?
Do I have to implement my own Lucene Search Handler for this?
Any advice is welcome!
Thank you,
Bernhard
I think your case requires executing 4 different searches. If you implement you very own SearchHandler you could avoid penalty of search result accumulation in 4 different request. Which means, you would send one query, and custom SearchHandler would execute 4 searches and prepare one result set.
If my guess is right you want to rank the results based on the order of the fields. If so then you can just use standard query like
q=sku:(query)^4 OR name:(query)^3 OR description:(query)^2 OR summary:(query)
this will rank the results by the order of the fields.
Hope is helps.