Lucene web paging - lucene

I am creating a web app with Lucene that I need to implement paging. I have seen here the different examples about using an offset on the collector. However, those seem to be old. Lucene 3.5 (or 3.6 can't remember which) added this I believe. I have seen the IndexSearcher method searchAfter. However, it requires you pass it the last ScoreDoc. Because this is a web app, I have no way to pass the last result (as a ScoreDoc object) to the next request. So, my question is how is this typically done?
The only way that I have really come up with is to add in a unique key into the index when building. Then, pass that key as a post parameter when trying to get the next page. Then, I would have to search for that key to get the document id and pull that document to use with searchAfter. I think I have to use my own unique key because I cannot rely on the document id to stay the same. Am I correct on this?
If there are better ways, please let me know. This is my first attempt at Lucene.

However, it requires you pass it the last ScoreDoc. Because this is a web app, I have no way to pass the last result (as a ScoreDoc object) to the next request. So, my question is how is this typically done?
I don't understand your problem, if you want to use searchAfter, just make a ScoreDoc to pass to it. your webapp can pass ints and floats right?
/** Constructs a ScoreDoc. */
public ScoreDoc(int doc, float score) {

As far as I'm aware, what you are doing at the moment is correct. A ScoreDoc which you construct yourself using ints and floats will not work. See my similar question:
Working Lucene SearchAfter Example

Related

How to specify a Take value with TableClient.QueryAsync

I'm updating my project to use Azure.Data.Tables 12.6.1, and I can't figure out where to specify a Take value to limit the number of entities returned from a query.
In other words, I want to do something like this:
var limit = 150;
var results = table.QueryAsync<T>(limit);
await foreach (var page in results.AsPages().ConfigureAwait(false)) {
// Regardless of how the server pages the results,
// only the top [limit] items are returned.
}
In the old API, you could set a Take property on the query object. How do I do this in the new API?
As #Skin points out, the current SDK does not expose an explicit API for Take, but that was an intentional decision to ensure that it is clearer to developers what is really happening from the service perspective.
The old SDK supported a full IQueryable API, which made it easy to create very expensive queries that performed filtering client side after fetching the whole table from the service.
Although Take doesn't have the same problems as other Linq methods, the service doesn't really support it. It can only limit the number of results for a paged result (which is capped at 1000 by the service).
While I agree it is not as simple as a Take API, the current API makes it fairly straightforward to implement the equivalent functionality, while not hiding the fact that you may actually fetch more than your Take limit from the service.
This sample demonstrates how to iterate through the pages with a max items per page set.
This may be a little controversial but I'm going to add this as an answer ... it looks like it's been raised as a feature request only a number of weeks ago and it's now been added to the backlog ...
https://github.com/Azure/azure-sdk-for-net/issues/30985
You're not the only one who has the same requirement.

How to intercept RavenDb Session.SaveChanges()

I am looking for a way to intercept Session.SaveChanges() so that I may execute some extra work using the same session instance (this is handy in some cases).
Edit: The point about re-using the session is that I have more work that needs to run in the same transaction.
I am already aware of (and make use of) IDocumentStoreListener - but this interface doesn't help because it does not give me access to the current session.
I can't find anything in RavenDb documentation about a way to intercept the call to SaveChanges and get a handle on the current session. Does anyone know of a way?
Open a new session it's free (in terms of performance), I think that IDocumentStoreListener has been thought for what you're looking for. I don't know other that works as you say.
implementing
void AfterStore(string key, object entityInstance, RavenJObject metadata);
you have all the information about the stored entity and then you can do what you need

Change results URL in Alfresco AIkau faceted search page

I have some difficulties customizing the Aikau faceted search page on Alfresco, which may be more a matter of lack of my knowledge about dojo/AMD.
What I want to do is to replace the document details page URL by a download URL.
I extended the Search Results Widget to include my own custom module :
var searchResultWidget = widgetUtils.findObject(model.jsonModel, "id", "FCTSRCH_SEARCH_RESULT");
if(searchResultWidget) {
searchResultWidget.name = "mynamespace/search/CustomAlfSearchResult";
}
I understand search results URLs are rendered this way :
AlfSearchResult module => uses SearchResultPropertyLink module => mixins _SearchResultLinkMixin renderer => bring the "generateSearchLinkPayload" function => renders URLs depending on the result type
I want to override this "generateSearchLinkPayload" function but I can't figure out what is the best way to do that.
Thanks in advance for the help !
This answer assumes you're able to use the latest version of Aikau (at the time of writing this is 1.0.61). Older versions might require slightly different overriding...
In order to do this you're going to need to override the createDisplayNameRenderer function of AlfSearchResult in your CustomAlfSearchResult widget. This will allow you to create an extension of alfresco/search/SearchResultPropertyLink.
If you want to take advantage of the the download capabilities provided by the alfresco/services/DocumentService for downloading both documents and folders (as a zip) then you're going to want to change both the publishTopic and publishPayload of the SearchResultPropertyLink.
You should extend the getPublishTopic and generateSearchLinkPayload functions. For the getPublishTopic function you'll want to change the return value to be "ALF_SMART_DOWNLOAD" (there are constants available for these strings in the alfresco/core/topics module). This topic can be used to tell the DocumentService to take care of figuring out if the node is a folder or document and will make an XHR request for the full node metadata (in order to get the contentUrl attribute that is not included in the data returned by the Search API.
You should extend the generateSearchLinkPayload function so that for document or folder types the payload contains the attribute nodes that is a single array where the object is the search result object that you wish to download.
I would recommend that you call this.inherited first to get the default payload and only update it for documents and folders.
Hopefully that all makes sense - if not, add a comment and I'll try to provide further assistance!
This is the answer for 1.0.25.2 - unfortunately it's not quite so straightforward...
You still need to extend the alfresco/search/AlfSearchResult widget, however this time you need to extend the postCreate function (remembering to call this.inherited(arguments)). It's not possible to stop the original alfresco/search/SearchResultPropertyLink widget from being created... so it will be necessary to find it and destroy it.
The widget is not assigned to a variable, so it will be necessary to find it using dijit/registry. Use the byNode function from dijit/registry to find the widget assigned to this.nameNode and then call destroy on it (be sure to pass the argument true to preserve the DOM). However, you will then need to empty the DOM node so that you can start again...
Now you need to add in your extension to alfresco/search/SearchResultPropertyLink. Unfortunately, because the smart download capability is not available you'll need to do more work. The difference here is that you'll need to make an XHR request to retrieve the full node metadata in order to obtain the contentURL. It's possible to publish a request to the DocumentService(via the "ALF_RETRIEVE_SINGLE_DOCUMENT_REQUEST" topic). However, you need to be aware that having the XHR step will not allow you to then proceed with the download as is. Instead you'll need to use an iframe download solution, I'd suggest you take a look at the changes in the pull request we recently made to solve this problem and backport them into your own solution.

Paging: Find Object-Position (Page) to Display

I'm using .take() and .skip() for paging with a table.
Now when I "insert" an entity into the database, I reload my table (new query). Now I would like to jump to this new object inside the table and highlight it.
Is there an elegant solution to find on what page the new object is and then use skip/take to jump to the correct page?
Edit:
Maybe Breeze/OData could natively support paging by allowing to specify a page-size in the query and which page to deliver (instead of using take and skip and calculating it on the client).
If this was the case, the parameter for "which page to deliver" could, instead of being an integer, also be a sub-query which would be executed on the resulting data before it gets "taked and skipped" to find out, on which page the object(s) are visible and use this as "page to deliver".
Edit 2:
Added the idea to Breeze UserVoice: https://breezejs.uservoice.com/forums/173093-1-breezejs-feature-suggestions/suggestions/6824937-support-paging-natively
It's an interesting idea, but I don't know of any really elegant solution to accomplish this. Conceptually, how would you expect this to work under the covers?

Slow results when using webkitSpeechRecognition vs x-webkit-speech?

I'm new to using this API and wasnt able to find an answer to what I'm running into.
When I use new webkitSpeechRecognition, and use the onresult event to find isFinal == true, it seems to take longer in finding the final result than using x-webkit-speech in an input tag.
Does anyone know if google is doing something specific to get a speedier result? Or do I need to set an attribute in the webkitSpeechRecognition object?
Thanks for any insight!
See my answer which explains how, in continuous mode results are triggered by new voice input, or otherwise will show up only after a timeout.
In non continuous mode, the result will show up much faster.