RavenDB - write to documents from transformer - ravendb

I've got an index that goes over a huge number of documents, and then a transformer that shapes a return and does some math logic to them.
Is it possible to write back to a field on the documents from within the transformer or index, instead of having to fetch the data and send another request to write to each document?
So for example, I have documents Scores, each has a property called Values that is a IList<double>.
I have an index that gets all of them, and a Transformer that does some math based on other properties in the retrieved documents.
var results =
session
.Query<Score, ScoresByName>()
.TransformWith<ScoresTransformer, ScoresTransformer.Result>()
.ToList();
Is it possible to write to each document before it ever comes back to me?
Basically, after the transformer runs, each document has new information in its Values property. I wish to just write that to the document; Otherwise, I have to run this query and transformer, then either write to each document in a loop, or run a patch request. I'd like to avoid that if it is possible.

You can use the scripted index results for this:
http://ravendb.net/docs/article-page/3.0/Csharp/server/bundles/scripted-index-results

Related

How to query Ravendb document size

I have a scenario in which I need to find the largest documents in my Ravendb database.
When I select any given document in Ravendb Studio, the size is displayed in the Properties section as circled in red in this screen shot:
Is there a query I can run that will order documents by this Size property so that I can identity the largest documents?
Maybe write a method that calculates your object size, probably using reflection.
Then, create a static Map Index with a field 'size',
and set it with your method that you will provide in the 'additional sources' in the index
See https://ravendb.net/docs/article-page/4.2/Csharp/studio/database/indexes/create-map-index#additional-sources
And then you could query this index and order-by the 'size' field
fyi - you can get a specific document size using the following endpoint:
{yourServerUrl}/databases/{yourDatabaseName}/docs/size?id={yourDocumentId}
Learn about ravenDB rest api in:
https://ravendb.net/docs/article-page/4.2/csharp/client-api/rest-api/rest-api-intro
Index (Map) definition:
from doc in docs
select new {
doc.BlittableJson.Size
}

querying categorymembers with wikimedia and the size

I try to get the page sizes of all category members through the wikimedia api with only one request.(or less then 10).
I know I would get the sizes of pages by:
(1) Requesting every page separately and get the size
or
(2) A search query like this:
http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=physics
The result is several pages with the size and word count property.
Now how can I get the size and word count for a category member with a query like this or with another trick ?
http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Physics
Any hints shared would be appreciated.
You can use a category query as a generator, using generator=categorymember and gcmtitle=Category:Physics. This will execute the query action for each and every page in that category:
api.php?action=query
&generator=categorymembers
&gcmtitle=Category:Lakes
&prop=info
In the docs you can see what properties can be used as generators: categories, links and templates. Also, more or less every list module can be used as a generator in the same fashion.
Note that parameter names are prefixed with a g when used for a generator, so that cmtitle in the example above becomes gcmtitle, to distinguish them from parameters to the query action (that is applied to every page returned by the generator), prop and inprop, parameters

docx4j Differencer Showing More Differences Than Expected

I have two documents:
Document 1 (input)
Document 2 (output)
Document 2 is the result of passing Document 1 through a transformation process which leaves any content and formatting intact (verified by side-by-side compare in Word).
However, the process removes many id numbers from the .docx files.
For example,
<w:p w:rsidP="00B600D6" w:rsidR="00F55D78" w:rsidRDefault="00B600D6">
becomes
<w:p>
according to a dump of each document via the following code:
Body body = ((Document)newerPackage.getMainDocumentPart().getJaxbElement()).getBody();
Node node = org.docx4j.XmlUtils.marshaltoW3CDomDocument(body).getDocumentElement();
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
transformer.transform(new DOMSource(node),
new StreamResult(new OutputStreamWriter(System.out, "UTF-8")));
Using the docx4j Differencer comparison method recommended here, everything (except the first line which has no formatting applied) is shown as a modification.
Question is: Are the diffs a result of the missing id's, the formatting or something else?
In case it's important, we're using docx4j in this context to perform automated sanity/regression tests on our round-tripping proceess (i.e. apply the "loss-less" process and expect no differences)
Disclosure: I work on docx4j
If the only difference between paragraphs is the rsid attributes, they will still be detected as different.
You could "clean" the documents before performing the comparison, so that neither docx has rsid attributes. See the Filter sample.
By the way, an easier way to see the XML for an object (eg a single paragraph, or the entire body) is to use XmlUtils.marshaltoString

Is it possible to compare 2 dimensions in a Google Analytics query filter?

I'm just starting out with the Google Analytics API and am wondering if it's possible to compare two dimensions via an operand in the filters I pass in the query. And by wondering I mean I've tried it, but have had no success.
Specifically I'm trying to compare 2 custom variable values. One holds the user who created a post (customVarValue3), the other the user who is viewing the post (customVarValue5). I want to get the pageviews only for the visitors who are not also the creator. The filter looks like this (without urlencoding applied):
ga:customVarValue3!=ga:customVarValue5
The full query (url encoded) looks like this:
https://www.google.com/analytics/feeds/data?ids=ga%3Axxxxxx&dimensions=ga%3AcustomVarValue1%2Cga%3AcustomVarValue2%2Cga%3AcustomVarValue3&metrics=ga%3Apageviews&filters=ga%3AcustomVarValue3!%3Dga%3AcustomVarValue5&sort=-ga%3Apageviews&start-date=2012-02-09&end-date=2012-02-23&max-results=50
However, it returns the same results (and I know there are results where ga:customVarValue3 == ga:customVarValue5).
Probably it isn't possible, but I just wanted to see if anyone knew how to achieve this or has a workaround or something.
No, it is not possible using the GAv3 API in its present state. You can, however, get all the results by using the specified two custom variables as the dimensions for a report, and programmatically filter out the unnecessary results.
Some simple construct like
for(var item in collatedResultsListwithDimensions) {
for(var row in item.rows) {
if(row[0]!=row[1])
newResultRows.push(row);
}
}
Now your newResultRows will have those rows where row[1]!=row[0] assuming the two custom variables you mentioned are the first two dimensions.

Use custom function to populate gSpreadsheet cell based on a XML/JSON response

Ok, this one has become a little tricky for me and I really need some assistance to work through it.
Problem
I have a GSpreadsheet which has a list of data, in this case Twitter usernames. Using the API of a service provider (in this case the Klout API), I would like to retrieve information about that user to populate a cell within a spreadsheet.
Based on what I can work out so far, I would need to write a custom function to do this but I have no idea where to start, how I might construct it, or if there are any examples of doing this.
Scenario
The Klout API can return either an XML or JSON response (see http://developer.klout.com/docs/read/api/API), based on the string passed. For example, the URL:
http://api.klout.com/1/users/show.xml?key=SECRET&users=thewinchesterau
would return the following XML response:
<users>
<user>
<twitter_id>17439480</twitter_id>
<twitter_screen_name>thewinchesterau</twitter_screen_name>
<score>
<kscore>56.63</kscore>
<slope>0</slope>
<description>creates content that is spread throughout their network and drives discussions.</description>
<kclass_id>10</kclass_id>
<kclass>Socializer</kclass>
<kclass_description>You are the hub of social scenes and people count on you to find out what's happening. You are quick to connect people and readily share your social savvy. Your followers appreciate your network and generosity.</kclass_description>
<kscore_description>thewinchesterau has a low level ofinfluence.</kscore_description>
<network_score>58.06</network_score>
<amplification_score>29.16</amplification_score>
<true_reach>90</true_reach>
<delta_1day>0.3</delta_1day>
<delta_5day>0.5</delta_5day>
</score>
</user>
</users>
Based on this response, I would like to be able to populate different cells with the values returned within the XML (or JSON if easier) packet.
So, for example, I would have a spreadsheet like the following which would have custom functions to go out and retrieve the value of the relevant XML element response to populate the cell:
Cell A B C D E
1 Username kscore Network score Amplification score True reach
2 thewinchester =kscore(A2) =nscore(A2) =ascore(A2) =tscore(A2)
Questions
Are there any gSpreadsheet examples you know of that use an API to pull data in from an external source?
How would one write a custom function to fetch the result from the API and populate a cell with a result of a specific element?
Any information, examples or helpers you have are greatly appreciated.
You want the importXML function, documented here. The formula you want will look something like this:
=importXML("http://api.klout.com/1/users/show.xml?key=SECRET&users=" + A1, "//users/user/score/kscore")
You could write a custom script with Google AppScript, but there's a simple solution to this similar to what Nick Johnson posted. I've tested this against the score function, but it could be easily adapted to the show endpoint with different XPath.
=importXML("http://api.klout.com/1/klout.xml?users="&A1&"&key=YOUR_API_KEY", "//users/user/kscore")
This presumes your Twitter IDs are in the A column.
Note, Google Docs limits the number of such importXML functions to 50 per spreadsheet. You could concatenate groups of 5 userids for each importXML call, effectively putting your limit to 250 a sheet.
This could also be adapted to a similar call in Excel that doesn't have that limit. Keep in mind the Klout ToS, though, using proper attribution and rate limits.