How to query Ravendb document size - ravendb

I have a scenario in which I need to find the largest documents in my Ravendb database.
When I select any given document in Ravendb Studio, the size is displayed in the Properties section as circled in red in this screen shot:
Is there a query I can run that will order documents by this Size property so that I can identity the largest documents?

Maybe write a method that calculates your object size, probably using reflection.
Then, create a static Map Index with a field 'size',
and set it with your method that you will provide in the 'additional sources' in the index
See https://ravendb.net/docs/article-page/4.2/Csharp/studio/database/indexes/create-map-index#additional-sources
And then you could query this index and order-by the 'size' field
fyi - you can get a specific document size using the following endpoint:
{yourServerUrl}/databases/{yourDatabaseName}/docs/size?id={yourDocumentId}
Learn about ravenDB rest api in:
https://ravendb.net/docs/article-page/4.2/csharp/client-api/rest-api/rest-api-intro

Index (Map) definition:
from doc in docs
select new {
doc.BlittableJson.Size
}

Related

Aem fulltextsearch

I want to search for a exact combination of words in all nodes in the aem using query builder.
Trying to debug the query http://localhost:4502/libs/cq/search/content/querydebug.html it returns me results that doesn't match my query.
For example if want to search for 'foo bar' in all nodes and I need to receive all nodes that contain 'Foo Bar', 'foo Bar', 'Foo bar', 'FOO BAR' but not only 'foo' and only 'bar' and not 'foo-bar'. Query in service is done by using QueryBuilder.
QueryBuilder is useful when you try to perform a query similar to SQL where you search against a property and its value. The full text search capabilities of the query debug interface is very limited as you have experienced.
However, remember that AEM uses an underlying Lucene and/or Solr index and it does provide a way to perform a native solr / lucene query.
Firstly create a embedded solr index (embedded is sufficient for a local development AEM instance) as mentioned under "Configuring AEM with an embedded SOLR server" in https://docs.adobe.com/docs/en/aem/6-0/deploy/upgrade/queries-and-indexing.html. This will trigger solr indexing of your JCR content.
Once indexing is complete (as seen from logs), you can perform native queries using the crx/de query interface.
Example query: select [jcr:path] from [nt:base] where native('solr', '<filter>?<solr_query_goes_here>'. Quite obviously you need to be familiar with solr queries. Thanks to the following slide share (slide 50 talks about native queries within AEM) http://www.slideshare.net/justinedelson/demystifying-oak-search
AEM support for native solr queries is a bit patchy. You might need to edit the SOLR schema xml file manually (created under the crx-quickstart folder) to add additional filters, custom fields etc. We had successfully tuned solr within AEM to perform a spacial search using the above method.
If you need all sorts of combinations for "foo bar" then you have to query:
fulltext=foo bar
You will only get the first 10 results. To get all, you'll need to:
p.limit=-1
You may want to specify the path:
path=/content/website/
Visit Adobe Query Builder API for more info.
Behind the scenes, AEM creates an xpath query and then executes it. Then, for any part of the query that doesn't map to xpath, it runs through the results and filters them.
You should also think about if there is a property to match as opposed to any text. That will give you much better results since you want accuracy. Right now you are casting an overly wide net, and I think you should consider restricting if for nothing other than performance reasons. Just a suggestion.
You say the results don't match your query, can you give us some idea of what comes back? And can you please put your actual query here. That will make it much easier to help.
this is a minimal example that provides a full-text search:
Query query = queryBuilder.createQuery(...);
// limit path
Predicate path = new Predicate(PathPredicateEvaluator.PATH);
path.set(PathPredicateEvaluator.PATH, "/content/where/ever);
query.getPredicates().add(path);
// Fulltext
Predicate fulltextSearch = new Predicate(FulltextPredicateEvaluator.FULLTEXT);
fulltextSearch.set(FulltextPredicateEvaluator.FULLTEXT, "foo bar");
fulltextSearch.set(FulltextPredicateEvaluator.REL_PATH, "jcr:content");
query.getPredicates().add(fulltextSearch);
// can I haz excerpt?
query.setExcerpt(true);
// Paging?
query.setStart(...);
query.setHitsPerPage(-1);
Note: it's not required to configure a solr index or whatever, you should be fine out of the box.
But if you limit the search to specific fields, you should create an index entry in oak:index. You can find a great cheat-sheet here.
I'm not sure if this helps.
but to get all the combinations of nodes that have the text i'm looking for I use jcr:like in xpath.
for example if I want to search all the nodes which has any property with Foo bar in its value or key, then my query looks like:
/jcr:root/content/yourpath//*[jcr:like(\*/, '%FOO bar%')]
You will not get that flexibility in QueryBuilder but you can still get what you want by using JCR-SQL2.
The following query will return all entries with "Foo Bar", "foo bar", "foo Bar", "Foo bar", but not "foo", "bar", "foo-bar" when your value is "foo bar".
SELECT * FROM [nt:unstructured] WHERE ISDESCENDANTNODE('/jcr:root/content/yourpath') AND LOWER([prop]) LIKE "%foo bar%" ORDER BY [cq:lastModified] desc
Just ensure that while checking for the values in repository you send the value in lowercase for case-insensitive search.
For case-sensitive search you can use:
SELECT * FROM [nt:unstructured] WHERE ISDESCENDANTNODE('/jcr:root/content/yourpath') AND [prop] LIKE "%foo bar%" ORDER BY [cq:lastModified] desc

RavenDB - write to documents from transformer

I've got an index that goes over a huge number of documents, and then a transformer that shapes a return and does some math logic to them.
Is it possible to write back to a field on the documents from within the transformer or index, instead of having to fetch the data and send another request to write to each document?
So for example, I have documents Scores, each has a property called Values that is a IList<double>.
I have an index that gets all of them, and a Transformer that does some math based on other properties in the retrieved documents.
var results =
session
.Query<Score, ScoresByName>()
.TransformWith<ScoresTransformer, ScoresTransformer.Result>()
.ToList();
Is it possible to write to each document before it ever comes back to me?
Basically, after the transformer runs, each document has new information in its Values property. I wish to just write that to the document; Otherwise, I have to run this query and transformer, then either write to each document in a loop, or run a patch request. I'd like to avoid that if it is possible.
You can use the scripted index results for this:
http://ravendb.net/docs/article-page/3.0/Csharp/server/bundles/scripted-index-results

How to use Sitecore Solr Custom Index

Could someone please help me understand below?
Do we need to specify the name of the index in code when using a Sitecore solr search?
If we make the new custom index called 'sitecore_web-index_custom'. How do we make sure we are using this index in code?
Thank you.
In order to get Sitecore index, use GetIndex method from the ContentSearchManager class:
Sitecore.ContentSearch.ContentSearchManager.GetIndex(...)
You can either pass index name:
// get Sitecore built in index for current database:
string dbName = (Sitecore.Context.ContentDatabase ?? Sitecore.Context.Database).Name;
var index = Sitecore.ContentSearch.ContentSearchManager.GetIndex("sitecore_" + dbName + "_index");
// get custom index
Sitecore.ContentSearch.ContentSearchManager.GetIndex("sitecore_web-index_custom")
or Sitecore Item:
// get index by Sitecore item
Sitecore.ContentSearch.ContentSearchManager.GetIndex((SitecoreIndexableItem)item);
In the second scenario, Sitecore will try to find the index in which the item is indexed.
There is no difference between getting Solr or Lucene indexes - Sitecore API is transparent here.
More information about Sitecore search and indexing can be found in
Sitecore Search and Indexing Guide
Developer's Guide to Item Buckets and Search

nested field in Solr 5.2

I'm new to Solr and I have a very specific problem that I need to solve:
I have a csv file that contains my Solr document. Now, I do have a column (field) that's not only multiValued, but also contains 'subfields'
for example
"id":"0101",
"addMaterials":[{"name":"Mat1", "property":"prop1"},
{"name":"Mat2","property":"prop2"},
{"name":"Mat3","property":"prop3"}],
"mainProperty":"mainproperty1",
"URL":"http://www.mySite..."
where id, addMaterials, mainProperty, and URL are my main fields while 'name' and 'property' are my subfields. I know that Solr is designed to handle denormalized documents but denormalizing is not a possible solution for my application.
What I'm thinking is to just separate my data set and move the fields (that have subfields) to another document and somehow make a new field to link it to the orginial document (e.g. fromIdField).
Is there any other solution to do this? My minimum goal is to index the values of addMaterials field (even without indexing the subfields)
from:
"addMaterials":[{"name":"Mat1", "property":"prop1"},
{"name":"Mat2","property":"prop2"},
{"name":"Mat3","property":"prop3"}],
to
"addMaterials":{"name":"Mat1", "property":"prop1"}
"addMaterials":{"name":"Mat2", "property":"prop2"}
"addMaterials":{"name":"Mat3", "property":"prop3"}
Thanks in advance.
I have found a solution to my problem. Instead of separating my data set, I kept the addMaterials field as a multiValued field and ignored the subfields. So I only have one multiValued field to be indexed. What I did was to use the update/ request of Solr to index my csv file and put },{ as my separator in my addMaterials multiValued field. The indexed document looks like this:
"addMaterials": ["[{\"name\":\"Mat1\", \"property\":\"prop1\"",
"\"name\":\"Mat2\", \"property\":\"prop2\"",
"\"name\":\"Mat3\", \"property\":\"prop3\"}]"]
I indexed my document using this:
curl "http://localhost:8983/solr/<coreName>/update/csv?
stream.file=C:/userName/Solr/solr-5.2.0/documentFolder/myFile.csv&
f.addMaterials.split=true&
f.addMaterials.separator=\},\{&
stream.contentType=text/plain;charset=utf-8"
Also, this assumes that the addMaterials field is a multiValued field. So make sure you modify your schema first before indexing your document using the procedure above. Otherwise, it will give an error saying that the f. is not a multiValued field.
Of course, if you need to query against the sub-fields then I guess you can use the !join command/function of Solr.

querying categorymembers with wikimedia and the size

I try to get the page sizes of all category members through the wikimedia api with only one request.(or less then 10).
I know I would get the sizes of pages by:
(1) Requesting every page separately and get the size
or
(2) A search query like this:
http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=physics
The result is several pages with the size and word count property.
Now how can I get the size and word count for a category member with a query like this or with another trick ?
http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Physics
Any hints shared would be appreciated.
You can use a category query as a generator, using generator=categorymember and gcmtitle=Category:Physics. This will execute the query action for each and every page in that category:
api.php?action=query
&generator=categorymembers
&gcmtitle=Category:Lakes
&prop=info
In the docs you can see what properties can be used as generators: categories, links and templates. Also, more or less every list module can be used as a generator in the same fashion.
Note that parameter names are prefixed with a g when used for a generator, so that cmtitle in the example above becomes gcmtitle, to distinguish them from parameters to the query action (that is applied to every page returned by the generator), prop and inprop, parameters