CouchDB View, Map, Index, and Sequence - indexing

I think read somewhere that when a View is requested the "map" is only run across documents that have been added since the last time it was requested? How is this determined? I thought I saw something about a sequence number. Is this something that you can get to? Its not part of the UUID trailing on the _rev field is it?
Any way to force a 'recalc' of the entire View (across all records)?

The section about View Indexes in the Technical Overview is a great guide to this.
The view builder uses the database sequence ID to determine if the view group is fully up-to-date with the database. If not, the view engine examines the all database documents (in packed sequential order) changed since the last refresh. Documents are read in the order they occur in the disk file, reducing the frequency and cost of disk head seeks.
As documents are examined, their previous row values are removed from the view indexes, if they exist. If the document is selected by a view function, the function results are inserted into the view as a new row.
CouchDB first checks to see if anything has changed in the entire database using a sequence id (that gets updated whenever there's a change to any document in the database). If something has changed it goes looking for those documents and runs the map function on them.
There really shouldn't be any need to rebuild/regenerate your views since it will incrementally refresh as you modify your documents (note that it won't update the view until you use it though). With hat said one way (and I'm sure there's a better way) would be to remove the design document describing the view and insert it again seeing as a design document is no different (almost) from a normal document.

Related

Any ALV-specifics with itab created by RTTS?

I create internal table by two steps, both refer to the RTTS-techniques.
The first step loads and parses a tab-delimited file into a table.
The second step reads this table by RTTI, then, hardcoded, adds some other columns in front of the old columns from the file and, finally adds the old fields back again, the table now has about 12 new hardcoded columns, in front of those from the file. The RTTS helps to create the final table, which then is passed as the data source to the ALV grid.
My former requirement did not take into account that the ALV-grid-toolbar-functions will ever be needed by the end-user, however, as always, this has changed. I enabled the toolbar functions, the default ones, without any custom button.
So, now the user can remove some columns from the display or add them back again, she/he can also change their order. Everything is fine but I never encountered this situation with a table, which is created during runtime.
Are there special culprits I need to be aware of ?
<ITAB> created using RTTS functionality is fully supported either by the REUSE_ALV_LIST_DISPLAY or one of ALV OOPS technologies. All the layouts should work fine. In fact I think in the cl_salv_table=>factory RTTS is responsible for automatic creation of the field catalog of the ITAB since it do not need field catalog passed by the parameter. The only thing that I heard is lost pointers of the <ITAB> ant this leads to refresh problems and so on but this is different story.
From my experience, ALV column maximum size is 120 characters. So if your file could have more than that, you could have a problem. Otherwise, do not expect any major thing.

How should I deal with copies of data in a database?

What should I do if a user has a few hundred records in the database, and would like to make a draft where they can take all the current data and make some changes and save this as a draft potentially for good, keeping the two copies?
Should I duplicate all the data in the same table and mark it as a draft?
or only duplicate the changes? and then use the "non-draft" data if no changes exist?
The user should be able to make their changes and then still go back to the live and make changes there, not affecting the draft?
Just simply introduce a version field in the tables that would be affected.
Content management systems (CMS) do this already. You can create a blog post for example, and it has version 1. Then a change is made and that gets version 2 and on and on.
You will obviously end up storing quite a bit more data. A nice benefit though is that you can easily write queries to load a version (or a snapshot) of data.
As a convention you could always make the highest version number the "active" version.
You can either use BEGIN TRANS, COMMIT and ROLLBACK statements or you can create a stored procedure / piece of code that means that any amendments the user makes are put into temporary tables until they are ready to be put into production.
If you are making a raft of changes it is best to use temporary tables as using COMMIT etc can result in locks on the live data for other uses.
This article might help if the above means nothing to you: http://www.sqlteam.com/article/temporary-tables
EDIT - You could create new tables (ie NOT temporary, but full fledged sql tables) "on the fly" and name them something meaningful. For instance, the users intials, followed by original table name, followed by a timestamp.
You can then programtically create, amend and delete these tables over long periods of time as well as compare against Live tables. You would need to keep track of how many tables are being created in case your database grows to vast sizes.
The only major headache then is putting the changes back into the live data. For instance, if someone takes a cut of data into a new table and then 3 weeks later decides to send it into live after making changes. In this instance there is a likelihood of the live data having changed anyway and possibly superseding the changes the user will submit.
You can get around this with some creative coding though. There are many ways to tackle this, so if you get stuck at the next step you might want to start a new question. Hopefully this at least gives you some inspiration though.

How to find the document visitior's count?

Actually I am in need of counting the visitors count for a particular document.
I can do it by adding a field, and increasing its value.
But the problem is following.,
I have 10 replication copies in different location. It is being replicated by scheduled manner. So replication conflict is happening because of document count is editing the same document in different location.
I would use an external solution for this. Just search for "visitor count" in your favorite search engine and choose a third party tool. You can then display the count on the page if that is important.
If you need to store the value in the database for some reason, perhaps you could store it as a new doc type that gets added each time (and cleaned up later) to avoid the replication issues.
Otherwise if storing it isn't required consider Google Analytics too.
Also I faced this problem. I can not say that it has a easy solution. Document locking is the only solution that i had found. But the visitor's count is not possible.
It is possible, but not by updating the document. Instead have an AJAX call to an agent or form with parameters on the URL identifying the document being read. This call writes a document into a tracking DB with one or two views and then determines from those views how many reads you have had. The number of reads is the return value of the AJAX form.
This can be written in LS, Java or #Formulas. I would try to do it 100% in #Formulas to make it as efficient as possible.
You can also add logic to exclude reads from the same user or same source IP address.
The tracking database then replicates using the same schedule as the other database.
Daily or Hourly agents can run to create summary documents and delete the detail documents so that you do not exceed the limits for #DBLookup.
If you do not need very nearly real time counts (and that is the best you can get with replicated system like this) you could use the web logs that domino generates by finding the reads in the logs and building the counts in a document per server.
/Newbs
Back in the 90s, we had a client that needed to know that each person had read a document without them clicking to sign or anything.
The initial solution was to add each name to a text field on a separate tracking document. This ran into problems when it got over 32k real fast. Then, one of my colleagues realized you could just have it create a document for each user to record that they'd read it.
Heck, you could have one database used to track all reads for all users of all documents, since one user can only open one document at a time -- each time they open a new document, either add that value to a field or create a field named after the document they've read on their own "reader tracker" document.
Or you could make that a mail-in database, so no worries about replication. Each time they open a document for which you want to track reads, it create a tiny document that has only their name and what document they read which gets mailed into the "read counter database". If you don't care who read it, you have an agent that runs on a schedule that updates the count and deletes the mailed-in documents.
There really are a lot of ways to skin this cat.

Loading Razor views from a database - VirtualPathProvider and CacheDependency confusion

I'm confused as to how CacheDependency works in VirtualPathProvider.GetCacheDependency().
Every example I've seen creates a cache dependency based on some physical file on disk, while I'm returning records from a database. Right now, I'm overriding GetFileHash and just returning the last date/time the relevant record was modified as the hash string. This works well, and I'm not sure using a CacheDependency item would affect the performance as I'd still have to go check the database every time the view is requested to see if it's been updated, but I'm still curious how to use CacheDependency.
Has anyone used this when returning views from a database?
Update
Using this now (http://razorengine.codeplex.com/) which works VERY well.
The point of CacheDependency is to provide you with an event that will be called when the cache becomes invalid (because the file on disk changed). Check out SqlCacheDependency that does the same thing with SQL Server entries.

Building a ColdFusion Application with Version Control

We have a CMS built entirely in house. I'm the new web developer guy with literally 4 weeks of ColdFusion Experience. What I want to do is add version control to our dynamic pages. Something like what Wordpress does. When you modify a page in Wordpress it makes some database entires and keeps a copy of each page when you save it. So if you create a page and modifiy it 6 times, all in one day you have 7 different versions to roll back if necessary. Is there a easy way to do something similar in Coldfusion?
Please note I'm not talking about source control or version control of actual CFM files, all pages are done on the backend dynamically using SQL.
sure you can. just stash the page content in another database table. you can do that with ColdFusion or via a trigger in the database.
One way (there are many) to do this is to add a column called "version" and a column called "live" in the table where you're storing all of your cms pages.
The column called live is option but might make it easier for your in some ways when starting out.
The column "version" will tell you what revision number of a document in the CMS you have. By a process of elimination you could say the newest one (highest version #) would be the latest and live one. However, you may need to override this some time and turn an old page live, which is what the "live" setting can be set to.
So when you click "edit" on a page, you would take that version that was clicked, and copy it into a new higher version number. It stays as a draft until you click publish (at which time it's written as 'live')..
I hope that helps. This kind of an approach should work okay with most schema designs but I can't say for sure either without seeing it.
Jas' solution works well if most of the changes are to one field, for example the full text of a page of content.
However, if you have many fields, and people only tend to change one or two at a time, a new entry in to the table for each version can quickly get out of hand, with many almost identical versions in the history.
In this case what i like to do is store the changes on a per field basis in a table ChangeHistory. I include the table name, row ID, field name, previous value, new value, and who made the change and when.
This acts as a complete change history for any field in any table. I'm also able to view changes by record, by user, or by field.
For realtime page generation from the database, your best bet are "live" and "versioned" tables. Reason being keeping all data, live and versioned, in one table will negatively impact performance. So if page generation relies on a single SELECT query from the live table you can easily version the result set using ColdFusion's Web Distributed Data eXchange format (wddx) via the tag <cfwddx>. WDDX is a serialized data format that works particularly well with ColdFusion data (sorta like Python's pickle, albeit without the ability to deal with objects).
The versioned table could be as such:
PageID
Created
Data
Where data is the column storing the WDDX.
Note, you could also use built-in JSON support as well for version serialization (serializeJSON & deserializeJSON), but cfwddx tends to be more stable.