Newly inserted documents to RavenDb not showing up in searches - ravendb

I have a standard install of RavenDb and am running into some problems after I insert a new document.
If I do a subsequent search or try to pull that document by it's Id after I've inserted it there is about a 25% chance that it's not included in the search results or that I get an error trying to retrieve it by it's Id. When I open up Raven Studio I can see that the document exists so what's the deal?
Is this because whatever index it is using to find the document hasn't been updated yet? How can I ensure that I am always querying the latest data so that this doesn't happen?

Yes it looks like this is due to stale indexes. There is a way to check if there are pending index operations which you can use as a way of ensuring that you are querying the latest data. This article describes how to do that:
http://ravendb.net/docs/article-page/3.0/csharp/indexes/stale-indexes

Related

Solr Optimization job is not deleting logically deleted documnents

I've two questions:
I've tried optimization with this following command:
curl 'http://hostname:port/solr//update?optimize=true&maxSegments=N&waitSearcher=false'
But when one segment have highest size with live docs and deleted docs both....Solr optimization job is not able to delete those logically deleted docs and also this merges the current segments to resulting segment count with the same deleted doc count as previous.
When already a core have certain segment count, I'm not able to optimize solr core with the same 'maxSegments=N'. Can optimization not be performed with resulting segment count similar to current segment count of a solr core?
Please provide best practices to do this and tell what I'm doing wrong.
Thanks! in Advance.
Starting with Solr 7.5, there is a change in behaviour of merging segments. Merging segments is what "optimize" does including removal of deleted documents, so you were on the right path. But starting with 7.5 segments are merged only, if certain criteria is fullfilled.
Please review article (found in email thread in Solr community):
https://lucidworks.com/post/solr-optimize-merge-expungedeletes-tips/
I had the same issue. After reading the article, I did set "maxSegments=1" and this made "optimize" do the desired job, since this enforces the old behaviour.
So it should work with your instance as well, if you specify "maxSegments=1" instead of "maxSegments=N".

BigQuery - 1 file duplicate out of many using Java API

I am using Java API quick start program to upload CSV files into bigquery tables. I uploaded more than thousand file but in one of the table 1 file's rows are duplicated in bigquery. I went through the logs but found only 1 entry of its upload. Also went through many similar questions where #jordan mentioned that the bug is fixed. Can there be any reason of this behavior? I am yet to try a solution mentioned of setting a job id. But just could not get any reason from my side of the duplicate entries...
Since BigQuery is append only by design, you need to accept there will be some duplicates in your system, and be able to write your queries in such way that selects the most recent version of it.

Exclude versioned documents while Querying-Raven db

I have appended the versioning bundle in midway of my project after having written most of my raven queries in my data access layer. Now because of versioning i have lots of replicated data. Whenever i query a type of document i can see the values replicated as many times as the document is versioned. Is there way to stop querying the re-visioned documents when i query for the current data in common without re-writing all of my queries with Exclude("Revisions").Is there any setting where i can say query on re-visioned document =False which i can set globally? please suggest something to overcome this..
That is the way it works, actually. It appears that you have disabled the versionning bundle, which would cause this to happen.

How to find the document visitior's count?

Actually I am in need of counting the visitors count for a particular document.
I can do it by adding a field, and increasing its value.
But the problem is following.,
I have 10 replication copies in different location. It is being replicated by scheduled manner. So replication conflict is happening because of document count is editing the same document in different location.
I would use an external solution for this. Just search for "visitor count" in your favorite search engine and choose a third party tool. You can then display the count on the page if that is important.
If you need to store the value in the database for some reason, perhaps you could store it as a new doc type that gets added each time (and cleaned up later) to avoid the replication issues.
Otherwise if storing it isn't required consider Google Analytics too.
Also I faced this problem. I can not say that it has a easy solution. Document locking is the only solution that i had found. But the visitor's count is not possible.
It is possible, but not by updating the document. Instead have an AJAX call to an agent or form with parameters on the URL identifying the document being read. This call writes a document into a tracking DB with one or two views and then determines from those views how many reads you have had. The number of reads is the return value of the AJAX form.
This can be written in LS, Java or #Formulas. I would try to do it 100% in #Formulas to make it as efficient as possible.
You can also add logic to exclude reads from the same user or same source IP address.
The tracking database then replicates using the same schedule as the other database.
Daily or Hourly agents can run to create summary documents and delete the detail documents so that you do not exceed the limits for #DBLookup.
If you do not need very nearly real time counts (and that is the best you can get with replicated system like this) you could use the web logs that domino generates by finding the reads in the logs and building the counts in a document per server.
/Newbs
Back in the 90s, we had a client that needed to know that each person had read a document without them clicking to sign or anything.
The initial solution was to add each name to a text field on a separate tracking document. This ran into problems when it got over 32k real fast. Then, one of my colleagues realized you could just have it create a document for each user to record that they'd read it.
Heck, you could have one database used to track all reads for all users of all documents, since one user can only open one document at a time -- each time they open a new document, either add that value to a field or create a field named after the document they've read on their own "reader tracker" document.
Or you could make that a mail-in database, so no worries about replication. Each time they open a document for which you want to track reads, it create a tiny document that has only their name and what document they read which gets mailed into the "read counter database". If you don't care who read it, you have an agent that runs on a schedule that updates the count and deletes the mailed-in documents.
There really are a lot of ways to skin this cat.

Building a ColdFusion Application with Version Control

We have a CMS built entirely in house. I'm the new web developer guy with literally 4 weeks of ColdFusion Experience. What I want to do is add version control to our dynamic pages. Something like what Wordpress does. When you modify a page in Wordpress it makes some database entires and keeps a copy of each page when you save it. So if you create a page and modifiy it 6 times, all in one day you have 7 different versions to roll back if necessary. Is there a easy way to do something similar in Coldfusion?
Please note I'm not talking about source control or version control of actual CFM files, all pages are done on the backend dynamically using SQL.
sure you can. just stash the page content in another database table. you can do that with ColdFusion or via a trigger in the database.
One way (there are many) to do this is to add a column called "version" and a column called "live" in the table where you're storing all of your cms pages.
The column called live is option but might make it easier for your in some ways when starting out.
The column "version" will tell you what revision number of a document in the CMS you have. By a process of elimination you could say the newest one (highest version #) would be the latest and live one. However, you may need to override this some time and turn an old page live, which is what the "live" setting can be set to.
So when you click "edit" on a page, you would take that version that was clicked, and copy it into a new higher version number. It stays as a draft until you click publish (at which time it's written as 'live')..
I hope that helps. This kind of an approach should work okay with most schema designs but I can't say for sure either without seeing it.
Jas' solution works well if most of the changes are to one field, for example the full text of a page of content.
However, if you have many fields, and people only tend to change one or two at a time, a new entry in to the table for each version can quickly get out of hand, with many almost identical versions in the history.
In this case what i like to do is store the changes on a per field basis in a table ChangeHistory. I include the table name, row ID, field name, previous value, new value, and who made the change and when.
This acts as a complete change history for any field in any table. I'm also able to view changes by record, by user, or by field.
For realtime page generation from the database, your best bet are "live" and "versioned" tables. Reason being keeping all data, live and versioned, in one table will negatively impact performance. So if page generation relies on a single SELECT query from the live table you can easily version the result set using ColdFusion's Web Distributed Data eXchange format (wddx) via the tag <cfwddx>. WDDX is a serialized data format that works particularly well with ColdFusion data (sorta like Python's pickle, albeit without the ability to deal with objects).
The versioned table could be as such:
PageID
Created
Data
Where data is the column storing the WDDX.
Note, you could also use built-in JSON support as well for version serialization (serializeJSON & deserializeJSON), but cfwddx tends to be more stable.