RavenDB Get document count after BulkInsertOperations

RavenDB Get document count after BulkInsertOperations - ravendb

I am using RavenDB to bulk load some documents. Is there a way to get the count of documents loaded into the database?
For insert operations I am doing:
BulkInsertOperation _bulk = docStore.BulkInsert(null,
new BulkInsertOptions{ CheckForUpdates = true});
foreach(MyDocument myDoc in docCollection)
_bulk.Store(myDoc);
_bulk.Dispose();
And right after that I call the following:
session.Query<MyDocument>().Count();
but I always get a number which is less than the count I see in raven studio.

By default, the query you are doing limits to a sane number of results, part of RavenDB's promise to be safe by default and not stream back millions of records.
In order to get the number of a specific type of document in yoru database, you need a special map-reduce index whose job it is to track the counts for each document type. Because this type of index deals directly with document metadata, it's easier to define this in Raven Studio instead of trying to create it with code.
The source for that index is in this question but I'll copy it here:
// Index Name: Raven/DocumentCollections
// Map Query
from doc in docs
let Name = doc["#metadata"]["Raven-Entity-Name"]
where Name != null
select new { Name , Count = 1}
// Reduce Query
from result in results
group result by result.Name into g
select new { Name = g.Key, Count = g.Sum(x=>x.Count) }
Then to access it in your code you would need a class that mimics the structure of the anonymous type created by both the Map and Reduce queries:
public class Collection
{
public string Name { get; set; }
public int Count { get; set; }
}
Then, as Ayende notes in the answer to the previously linked question, you can get results from the index like this:
session.Query<Collection>("Raven/DocumentCollections")
.Where(x => x.Name == "MyDocument")
.FirstOrDefault();
Keep in mind, however, that indexes are updated asynchronously so after bulk-inserting a bunch of documents, the index may be stale. You can force it to wait by adding .Customize(x => x.WaitForNonStaleResults()) right after the .Query(...).
Raven Studio actually gets this data from the index Raven/DocumentsByEntityName which exists for every database, by sidestepping normal queries and getting metadata on the index. You can emulate that like this:
QueryResult result = docStore.DatabaseCommands.Query("Raven/DocumentsByEntityName",
new Raven.Abstractions.Data.IndexQuery
{
Query = "Tag:MyDocument",
PageSize = 0
},
includes: null,
metadataOnly: true);
var totalDocsOfType = result.TotalResults;
That QueryResult contains a lot of useful data:
{
Results: [ ],
Includes: [ ],
IsStale: false,
IndexTimestamp: "2013-11-08T15:51:25.6463491Z",
TotalResults: 3,
SkippedResults: 0,
IndexName: "Raven/DocumentsByEntityName",
IndexEtag: "01000000-0000-0040-0000-00000000000B",
ResultEtag: "BA222B85-627A-FABE-DC7C-3CBC968124DE",
Highlightings: { },
NonAuthoritativeInformation: false,
LastQueryTime: "2014-02-06T18:12:56.1990451Z",
DurationMilliseconds: 1
}
A lot of that is the same data you get on any query if you request statistics, like this:
RavenQueryStatistics stats;
Session.Query<Course>()
.Statistics(out stats)
// Rest of query

Related

How to make complex nested where conditions with typeORM?

I am having multiple nested where conditions and want to generate them without too much code duplication with typeORM.
The SQL where condition should be something like this:
WHERE "Table"."id" = $1
AND
"Table"."notAvailable" IS NULL
AND
(
"Table"."date" > $2
OR
(
"Table"."date" = $2
AND
"Table"."myId" > $3
)
)
AND
(
"Table"."created" = $2
OR
"Table"."updated" = $4
)
AND
(
"Table"."text" ilike '%search%'
OR
"Table"."name" ilike '%search%'
)
But with the FindConditions it seems not to be possible to make them nested and so I have to use all possible combinations of AND in an FindConditions array. And it isn't possible to split it to .where() and .andWhere() cause andWhere can't use an Object Literal.
Is there another possibility to achieve this query with typeORM without using Raw SQL?

When using the queryBuilder I would recommend using Brackets
as stated in the Typeorm doc: https://typeorm.io/#/select-query-builder/adding-where-expression
You could do something like:
createQueryBuilder("user")
.where("user.registered = :registered", { registered: true })
.andWhere(new Brackets(qb => {
qb.where("user.firstName = :firstName", { firstName: "Timber" })
.orWhere("user.lastName = :lastName", { lastName: "Saw" })
}))
that will result with:
SELECT ...
FROM users user
WHERE user.registered = true
AND (user.firstName = 'Timber' OR user.lastName = 'Saw')

I think you are mixing 2 ways of retrieving entities from TypeORM, find from the repository and the query builder. The FindConditions are used in the find function. The andWhere function is use by the query builder. When building more complex queries it is generally better/easier to use the query builder.
Query builder
When using the query build you got much more freedom to make sure the query is what you need it to be. With the where you are free to add any SQL as you please:
const desiredEntity = await connection
.getRepository(User)
.createQueryBuilder("user")
.where("user.id = :id", { id: 1 })
.andWhere("user.date > :date OR (user.date = :date AND user.myId = :myId)",
{
date: specificCreatedAtDate,
myId: mysteryId,
})
.getOne();
Note that depending on your used database the actual SQL that you use here needs to be compatible. With that could also come a possible draw back of using this method. You will tie your project to a specific database. Make sure to read up about the aliases for tables you can set if you are using relations this would be handy.
Repository
You already saw that this is much less comfortable. This is because the find function or more specific the findOptions are using objects to build the where clause. This makes is harder to implement a proper interface to implement nested AND and OR clauses side by side. There for (I assume) they have chosen to split AND and OR clauses. This makes the interface much more declarative and means the you have to pull your OR clauses to the top:
const desiredEntity = await repository.find({
where: [{
id: id,
notAvailable: Not(IsNull()),
date: MoreThan(date)
},{
id: id,
notAvailable: Not(IsNull()),
date: date
myId: myId
}]
})
I cannot imagin looking a the size of the desired query that this code would be very performant.
Alternatively you could use the Raw find helper. This would require you to rewrite your clause per field, since you will only get access to the one alias at a time. You could guess the column names or aliases but this would be very poor practice and very unstable since you cannot directly control this easily.

if you want to nest andWhere statements if a condition is meet here is an example:
async getTasks(filterDto: GetTasksFilterDto, user: User): Promise<Task[]> {
const { status, search } = filterDto;
/* create a query using the query builder */
// task is what refer to the Task entity
const query = this.createQueryBuilder('task');
// only get the tasks that belong to the user
query.where('task.userId = :userId', { userId: user.id });
/* if status is defined then add a where clause to the query */
if (status) {
// :<variable-name> is a placeholder for the second object key value pair
query.andWhere('task.status = :status', { status });
}
/* if search is defined then add a where clause to the query */
if (search) {
query.andWhere(
/*
LIKE: find a similar match (doesn't have to be exact)
- https://www.w3schools.com/sql/sql_like.asp
Lower is a sql method
- https://www.w3schools.com/sql/func_sqlserver_lower.asp
* bug: search by pass where userId; fix: () whole addWhere statement
because andWhere stiches the where class together, add () to make andWhere with or and like into a single where statement
*/
'(LOWER(task.title) LIKE LOWER(:search) OR LOWER(task.description) LIKE LOWER(:search))',
// :search is like a param variable, and the search object is the key value pair. Both have to match
{ search: `%${search}%` },
);
}
/* execute the query
- getMany means that you are expecting an array of results
*/
let tasks;
try {
tasks = await query.getMany();
} catch (error) {
this.logger.error(
`Failed to get tasks for user "${
user.username
}", Filters: ${JSON.stringify(filterDto)}`,
error.stack,
);
throw new InternalServerErrorException();
}
return tasks;
}

I have a list of
{
date: specificCreatedAtDate,
userId: mysteryId
}
My solution is
.andWhere(
new Brackets((qb) => {
qb.where(
'userTable.date = :date0 AND userTable.type = :userId0',
{
date0: dates[0].date,
userId0: dates[0].type,
}
);
for (let i = 1; i < dates.length; i++) {
qb.orWhere(
`userTable.date = :date${i} AND userTable.userId = :userId${i}`,
{
[`date${i}`]: dates[i].date,
[`userId${i}`]: dates[i].userId,
}
);
}
})
)
That will produce something similar
const userEntity = await repository.find({
where: [{
userId: id0,
date: date0
},{
id: id1,
userId: date1
}
....
]
})

Counting cascaded distincts in RavenDB

I'm facing an index problem for which I can't see a solution yet.
I have the following document structure per board:
{
"Name": "Test Board",
...
"Settings": {
"Admins": [ "USER1", "USER2" ],
"Members": [ "USER3", "USER4", "USER5" ]
...
},
...
"CreatedBy": "USER1",
"CreatedOn": "2014-09-26T18:14:20.0858945"
...
}
Now I'd like to be able to retrieve the count of all users which are somewhere registered in a board. Of course this should not only count the number of user occurences but rather count the number of distinct users. One user can be member of multiple boards.
This operation should perform as fast as possible since it is displayed in a global statistics dashboard visible on each page. Therefor I chose to try it with an index instead of retrieving all boards and their users and do the work on client side.
Trying to achieve this by using a Map/Reduce index:
Map = boards => from board in boards
select new
{
Aggregation = "ALL",
Users = new object[]
{
board.CreatedBy,
board.Settings.Admins,
board.Settings.Members
},
NumberOfUsers = 1
};
Reduce = results => from res in results
group res by new
{
res.Aggregation
}
into g
select new
{
g.Key.Aggregation,
Users = g.Select(x => x.Users),
NumberOfUsers = g.Sum(x => x.Users.Length)
};
Obviously this results in a wrong count. I don't have any experience with Reduce yet so I appreciate any tip! The solution will be probably pretty easy...
What would be the best way to globally distinct CreatedBy, Admins and Members of all documents and return the count?

Use an index like this:
from board in docs.Boards
select new
{
Users = board.Settings.Admins.Count + board.Settings.Members.Count + 1 /* created by */
}
from r in results
group r by "all" into g
select new
{
Users = g.Sum(x=>x.Users)
}

The best I could come up so far is:
Map = boards => from board in boards
select new
{
Users = new object[]
{
board.CreatedBy,
board.Settings.Admins,
board.Settings.Members
}
};
Reduce = results => from r in results
group r by "all" into g
select new
{
Users = g.SelectMany(x => x.Users)
};
And then query for the distinct user count:
var allUsersQuery = _documentSession.Query<AllUsersIndex.Result, AllUsersIndex>();
return allUsersQuery.Any() ? allUsersQuery.First().Users.Distinct().Count() : 0;
At least the query only returns a list of all usernames on all boards instead of bigger object trees. But the uniqueness still has to be done client-side.
If there is any better way please let me know. It would be beautiful to have only one integer returned from the server...

Then use this:
from board in docs.Boards
from user in board.Settings.Admins.Concat(board.Settings.Members).Concat(new[]{board.CreatedBy})
select new
{
User = user,
Count = 1
}
from r in results
group r by r.User into g
select new
{
User = g.Key,
Count = g.Sum(x=>x.Count)
}
I'm not really happy about the fanout, but this will give you all the discint users and the number of times they appear.
If you want just the number of distinct users, just get the total results from the index.

PouchDB Query like sql

with CouchDB is possible do queries "like" SQL. http://guide.couchdb.org/draft/cookbook.html says that
How you would do this in SQL:
SELECT field FROM table WHERE value="searchterm"
How you can do this in CouchDB:
Use case: get a result (which can be a record or set of records) associated with a key ("searchterm").
To look something up quickly, regardless of the storage mechanism, an index is needed. An index is a data structure optimized for quick search and retrieval. CouchDB’s map result is stored in such an index, which happens to be a B+ tree.
To look up a value by "searchterm", we need to put all values into the key of a view. All we need is a simple map function:
function(doc) {
if(doc.value) {
emit(doc.value, null);
}
}
This creates a list of documents that have a value field sorted by the data in the value field. To find all the records that match "searchterm", we query the view and specify the search term as a query parameter:
/database/_design/application/_view/viewname?key="searchterm"
how can I do this with PouchDB? the API provide methods to create temp view, but how I can personalize the get request with key="searchterm"?

You just add your attribute settings to the options object:
var searchterm = "boop";
db.query({map: function(doc) {
if(doc.value) {
emit(doc.value, null);
}
}, { key: searchterm }, function(err, res) { ... });
see http://pouchdb.com/api.html#query_database for more info

using regex
import PouchDB from 'pouchdb';
import PouchDBFind from 'pouchdb-find';
...
PouchDB.plugin(PouchDBFind)
const db = new PouchDB(dbName);
db.createIndex({index: {fields: ['description']}})
....
const {docs, warning} = await db.find({selector: { description: { $regex: /OVO/}}})

facets with ravendb

i am trying to work with the facet ability in ravendb but getting strange results.
i have a documents like :
{
"SearchableModel": "42LC2RR ",
"ModelName": "42LC2RR",
"ModelID": 490578,
"Name": "LG 42 Television 42LC2RR",
"Desctription": "fffff",
"Image": "1/4/9/8/18278941c",
"MinPrice": 9400.0,
"MaxPrice": 9400.0,
"StoreAmounts": 1,
"AuctionAmounts": 0,
"Popolarity": 3,
"ViewScore": 0.0,
"ReviewAmount": 2,
"ReviewScore": 45,
"Sog": "E-TV",
"SogID": 1,
"IsModel": true,
"Manufacrurer": "LG",
"ParamsList": [
"1994267",
"46570",
"4134",
"4132",
"4118",
"46566",
"4110",
"180676",
"239517",
"750771",
"2658507",
"2658498",
"46627",
"4136",
"169941",
"169846",
"145620",
"169940",
"141416",
"3190767",
"3190768",
"144720",
"2300706",
"4093",
"4009",
"1418470",
"179766",
"190025",
"170557",
"170189",
"43768",
"4138",
"67976",
"239516",
"3190771",
"141195"
],
}
where the ParamList each represents a property of the product and in our application we have in cache what each param represents.
when searching for a specific product i would like to count all the returning attributes to be able to add the amount of each item after the search.
After searching lg in televisions category i want to get :
Param:4134 witch is a representative of LCD and the amount :65.
but unfortunately i am getting strange results. only some params are counted and some not.
on some searchers where i am getting results back i dont get any amounts back.
i am using the latest stable version of RavenDB.
index :
from doc in docs
from param in doc.ParamsList
select new {Name=doc.Name,Description=doc.Description,SearchNotVisible = doc.SearchNotVisible,SogID=doc.SogID,Param =param}
facet :
DocumentStore documentStore = new DocumentStore { ConnectionStringName = "Server" };
documentStore.Initialize();
using (IDocumentSession session = documentStore.OpenSession())
{
List<Facet> _facets = new List<Facet>
{
new Facet {Name = "Param"}
};
session.Store(new FacetSetup { Id = "facets/Params", Facets = _facets });
session.SaveChanges();
}
usage example :
IDictionary<string, IEnumerable<FacetValue>> facets = session.Advanced.DatabaseCommands.GetFacets("FullIndexParams", new IndexQuery { Query = "Name:lg" }, "facets/Params");
i tried many variations without success.
does anyone have ideas what am i doing wrong ?
Thanks

Use this index, it should resolve your problem:
from doc in docs
select new {Name=doc.Name,Description=doc.Description,SearchNotVisible = doc.SearchNotVisible,SogID=doc.SogID,Param = doc.ParamsList}

What analyzer you set for "Name" field. I see you search by Name "lg". By default, Ravendb use KeywordAnalyzer, means you must search by exact name. You should set another analyzer for Name or Description field (StandardAnalyzer for example).

Map/Reduce over sharded data with RavenDB

I'm having trouble getting a map reduce sample to work when the data is sharded across two nodes. I'm storing documents that relate to application errors being logged on two local ravenDB nodes, the error documents look like:
Example of document on node 1, there are 6 total
errors/1/6
{
"UniqueId": "c62c7e30-8ec7-45af-88e4-da023d796727",
"ApplicationName": "MyAppName"
}
Example of document on node 2, there are 7 total
errors/2/6 --Error stored on shard node 2
{
"UniqueId": "7e0b0f87-9d75-4e70-9fa0-d64a18bc88dc",
"ApplicationName": "MyAppName"
}
when I run this query:
public class ApplicationNames : AbstractIndexCreationTask<ErrorDocument, Application>
{
public ApplicationNames()
{
Map = errors => from error in errors
select new { error.ApplicationName, Count = 1 };
Reduce = results => from error in results
group error by new { error.ApplicationName, error.Count } into g
select new { g.Key.ApplicationName, Count = g.Sum(x=> x.Count) };
}
}
I'm getting back 2 results; one with a Count of 6, the second with a Count of 7. I was expecting that the two results from each shard would be combined into one result with a count of 13. Not sure if I'm doing something wrong or if that's not how its supposed to work. I followed the example at http://ravendb.net/documentation/docs-sharding to set up the sharding strategy.

Grant,
RavenDB currently doesn't handle reduce over multiple nodes.
You can do that yourself using:
session.Query<Application, ApplicationNames>()
.ToList()
.Select(new ApplicationNames().Reduce)
.ToList();

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

RavenDB Get document count after BulkInsertOperations - ravendb

Related

How to make complex nested where conditions with typeORM?

Counting cascaded distincts in RavenDB

PouchDB Query like sql

facets with ravendb

Map/Reduce over sharded data with RavenDB

Categories

Resources