Counting cascaded distincts in RavenDB - ravendb

I'm facing an index problem for which I can't see a solution yet.
I have the following document structure per board:
{
"Name": "Test Board",
...
"Settings": {
"Admins": [ "USER1", "USER2" ],
"Members": [ "USER3", "USER4", "USER5" ]
...
},
...
"CreatedBy": "USER1",
"CreatedOn": "2014-09-26T18:14:20.0858945"
...
}
Now I'd like to be able to retrieve the count of all users which are somewhere registered in a board. Of course this should not only count the number of user occurences but rather count the number of distinct users. One user can be member of multiple boards.
This operation should perform as fast as possible since it is displayed in a global statistics dashboard visible on each page. Therefor I chose to try it with an index instead of retrieving all boards and their users and do the work on client side.
Trying to achieve this by using a Map/Reduce index:
Map = boards => from board in boards
select new
{
Aggregation = "ALL",
Users = new object[]
{
board.CreatedBy,
board.Settings.Admins,
board.Settings.Members
},
NumberOfUsers = 1
};
Reduce = results => from res in results
group res by new
{
res.Aggregation
}
into g
select new
{
g.Key.Aggregation,
Users = g.Select(x => x.Users),
NumberOfUsers = g.Sum(x => x.Users.Length)
};
Obviously this results in a wrong count. I don't have any experience with Reduce yet so I appreciate any tip! The solution will be probably pretty easy...
What would be the best way to globally distinct CreatedBy, Admins and Members of all documents and return the count?

Use an index like this:
from board in docs.Boards
select new
{
Users = board.Settings.Admins.Count + board.Settings.Members.Count + 1 /* created by */
}
from r in results
group r by "all" into g
select new
{
Users = g.Sum(x=>x.Users)
}

The best I could come up so far is:
Map = boards => from board in boards
select new
{
Users = new object[]
{
board.CreatedBy,
board.Settings.Admins,
board.Settings.Members
}
};
Reduce = results => from r in results
group r by "all" into g
select new
{
Users = g.SelectMany(x => x.Users)
};
And then query for the distinct user count:
var allUsersQuery = _documentSession.Query<AllUsersIndex.Result, AllUsersIndex>();
return allUsersQuery.Any() ? allUsersQuery.First().Users.Distinct().Count() : 0;
At least the query only returns a list of all usernames on all boards instead of bigger object trees. But the uniqueness still has to be done client-side.
If there is any better way please let me know. It would be beautiful to have only one integer returned from the server...

Then use this:
from board in docs.Boards
from user in board.Settings.Admins.Concat(board.Settings.Members).Concat(new[]{board.CreatedBy})
select new
{
User = user,
Count = 1
}
from r in results
group r by r.User into g
select new
{
User = g.Key,
Count = g.Sum(x=>x.Count)
}
I'm not really happy about the fanout, but this will give you all the discint users and the number of times they appear.
If you want just the number of distinct users, just get the total results from the index.

Related

Find entity with most relations filtered by criteria

model Player {
id String #id
name String #unique
game Game[]
}
model Game {
id String #id
isWin Boolean
playerId String
player Player #relation(fields: [playerId], references: [id])
}
I would like to find a player with most wins. How would I do that with prisma? If there is no prisma "native" way to do it, what is the most efficient way to do this with raw SQL?
The best I could think of is:
prisma.player.findMany({
include: {
game: {
where: {
isWin: true,
},
},
},
})
But it has huge downside that you need to filter and order results in Node manually, and also store all results in memory while doing so.
Using the groupBy API you can find the player with the most wins using two queries.
1. Activate orderByAggregateGroup
You'l need to use the orderByAggregateGroup preview feature and use Prisma version 2.21.0 or later.
Update your Prisma Schema as follows
generator client {
provider = "prisma-client-js"
previewFeatures = ["orderByAggregateGroup"]
}
// ... rest of schema
2. Find playerId of most winning player
Use a groupBy query to do the following:
Group games by the playerId field.
Find the count of game records where isWin is true.
Order them in descending order by the count mentioned in 2.
Take only 1 result (since we want the player with the most wins. You can change this to get the first-n players as well).
The combined query looks like this:
const groupByResult = await prisma.game.groupBy({
by: ["playerId"],
where: {
isWin: true,
},
_count: {
isWin: true,
},
orderBy: {
_count: {
isWin: "desc",
},
},
take: 1, // change to "n" you want to find the first-n most winning players.
});
const mostWinningPlayerId = groupByResult[0].playerId;
I would suggest checking out the Group By section of the Aggregation, grouping, and summarizing article in the prisma docs, which explains how group by works and how to use it with filtering and ordering.
3. Query player data with findUnique
You can trivially find the player using a findUnique query as you have the id.
const mostWinningPlayer = await prisma.player.findUnique({
where: {
id: mostWinningPlayerId,
},
});
Optionally, if you want the first "n" most winning players, just put the appropriate number in the take condition of the first groupBy query. Then you can do a findMany with the in operator to get all the player records. If you're not sure how to do this, feel free to ask and I'll clarify with sample code.

Making a select with an array

Hello I have and Array with objects, each object have atributes that I need for an select:
In this case it is the result from another consult with typeorm
" const CompaniesRelation: Array = await getRepository(CompanyRelation).find({ where:{ UserId: data.UserId, IsActive: true} });"
Companies: Array = [{CompanyId="a"}{CompanyId="b"}{CompanyId="c"}];
I need to make an select of all the data that matches with the Ids that are into Companies so for that I need to make an SQL like it:
const CompanyData: Array = SELECT *
FROM Company
INNER JOIN Company.CompanyId = CompaniesRelation[].CompanyId;
but it throw me error in typing, ¿how can I acces to each objetc into the array for make that match?
At the final I should traduce it sql to typeOrm, but I new and solving first in SQL it should help me to traduce to typeorm
Okay great, let us consider what we have to work with right:
So first we have a statement that gets a list of companies like so:
const CompaniesRelation: Array = await getRepository(CompanyRelation).find({
where: {
UserId: data.UserId,
IsActive: true
}
});
which ends up with something like this:
[ { CompanyId: 'a' }, { CompanyId: 'b' }, { CompanyId: 'c' } ]
Now we want to get a list of companies from an SQL DB with these Company IDs.
So the query should look like this:
// so first we re map the relation to an array of strings...
const ids: Array<string> = CompaniesRelation.map(c => c.CompanyId);
// then use it in the query, note the string interpolation for the query
const query: string = `SELECT * FROM Company WHERE CompanyId IN(${JSON.stringify(ids).slice(1, -1)});`;
I don't think this will cover the scope of the problem you have, I hope it helps though...feel free to ask

Search Query Not Handling Large Number of Users

We have an asp.net-core website which handles users search as follows:
public async Task<ICollection<UserSearchResult>> SearchForUser(string name, int page)
{
return await db.ApplicationUsers.Where(u => u.Name.Contains(name) && !u.Deleted && u.AppearInSearch)
.OrderByDescending(u => u.Verified)
.Skip(page * recordsInPage)
.Take(recordsInPage)
.Select(u => new UserSearchResult()
{
Name = u.Name,
Verified = u.Verified,
PhotoURL = u.PhotoURL,
UserID = u.Id,
Subdomain = u.Subdomain
}).ToListAsync();
}
The query translates to something similar to the following:
SELECT [t].[Name], [t].[Verified], [t].[PhotoURL], [t].[Id], [t].[Subdomain] FROM (SELECT [u0].* FROM [AspNetUsers] AS [u0] WHERE (((CHARINDEX('khaled', [u0].[Name]) > 0) OR ('khaled' = N'')) AND ([u0].[Deleted] = 0)) AND ([u0].[AppearInSearch] = 1) ORDER BY [u0].[Verified] DESC OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY ) AS [t]
In Client-Side we use typeahead and bloodhound as follows:
engine = new Bloodhound({
identify: function (user) {
return user.UserID;
},
queryTokenizer: Bloodhound.tokenizers.whitespace,
datumTokenizer: Bloodhound.tokenizers.obj.whitespace('name'),
dupDetector: function (a, b) { return a.UserID === b.UserID; },
remote: {
cache: false,
url: '/account/Search?name=%QUERY&page=0',
wildcard: '%QUERY'
}
});
and we configure typeahead as follows:
$('#demo-input').typeahead(
{
hint: $('.Typeahead-hint'),
menu: $('.Typeahead-menu'),
minLength: 3,
classNames:
{
open: 'is-open',
empty: 'is-empty',
cursor: 'is-active',
suggestion: 'Typeahead-suggestion',
selectable: 'Typeahead-selectable'
}
},
{
source: engineWithDefaults,
displayKey: 'name',
templates:
{
suggestion: template,
empty: empty,
footer: all
},
limit: 5
})
The search works just find on localhost and the query runs great as a sql query.
I have also created an index on Verified and cut the speed to 1 second or less.
Our website has millions of registered users and the problem is that as soon as we make search available for all users the DTU percentage on Azure goes to 100% and the queries timeout.
We also have a redis cache to speed-up similar queries but this didn't help us with this issue.
Your support is appreciated :)
It's quite likely to be u.Name.Contains(name) i.e. CHARINDEX('khaled', [u0].[Name]) > 0 which will have to scan the entire table or, at best, the index. That will be slow and there's not much you can do about it.
If you have a large bias to deleted or appearInSearch you might be able to use a conditional index but these types of searches are notoriously slow. You will need some special constructs to make this work.

Sequelize js get count of associated model

Say for example I have a discussions table and replies table.
How do I get the count of replies for each discussion record ?
Models.course_discussions.count({
where: { 'course_id': 105 },
include: [
{model: Models.course_discussions_replies, as: 'replies'}
]
})
.then(function (discussions) {
if (!discussions) {
reply(Hapi.error.notFound());
} else {
console.log("91283901230912830812 " , discussions);
}
});
The above code is converted into the following query -
SELECT COUNT(`course_discussions`.`id`) as `count` FROM `course_discussions` LEFT OUTER JOIN `course_discussions_replies` AS `replies` ON `course_discussions`.`id` = `replies`.`discussion_id` WHERE `course_discussions`.`course_id`=105;
The above code gets me count of discussions. But how do I get the count of the replies for each discussion ?
The following sql query works, but how do I write it in the sequelize way ?
SELECT COUNT( `replies`.`discussion_id` ) AS `count`
FROM `course_discussions`
LEFT OUTER JOIN `course_discussions_replies` AS `replies` ON `course_discussions`.`id` = `replies`.`discussion_id`
WHERE `course_discussions`.`course_id` =105
If you need the discussions and the replies count maybe .replies.length should work for you.
Models.course_discussions.findAll(
where: { 'course_id': 105 },
include: [
{model: Models.course_discussions_replies, as: 'replies'}
]
})
.then(function (discussions) {
// each discussion in discussions will have a
// replies array
discussions[0].repies.length // replies for current discussion
});

RavenDB Get document count after BulkInsertOperations

I am using RavenDB to bulk load some documents. Is there a way to get the count of documents loaded into the database?
For insert operations I am doing:
BulkInsertOperation _bulk = docStore.BulkInsert(null,
new BulkInsertOptions{ CheckForUpdates = true});
foreach(MyDocument myDoc in docCollection)
_bulk.Store(myDoc);
_bulk.Dispose();
And right after that I call the following:
session.Query<MyDocument>().Count();
but I always get a number which is less than the count I see in raven studio.
By default, the query you are doing limits to a sane number of results, part of RavenDB's promise to be safe by default and not stream back millions of records.
In order to get the number of a specific type of document in yoru database, you need a special map-reduce index whose job it is to track the counts for each document type. Because this type of index deals directly with document metadata, it's easier to define this in Raven Studio instead of trying to create it with code.
The source for that index is in this question but I'll copy it here:
// Index Name: Raven/DocumentCollections
// Map Query
from doc in docs
let Name = doc["#metadata"]["Raven-Entity-Name"]
where Name != null
select new { Name , Count = 1}
// Reduce Query
from result in results
group result by result.Name into g
select new { Name = g.Key, Count = g.Sum(x=>x.Count) }
Then to access it in your code you would need a class that mimics the structure of the anonymous type created by both the Map and Reduce queries:
public class Collection
{
public string Name { get; set; }
public int Count { get; set; }
}
Then, as Ayende notes in the answer to the previously linked question, you can get results from the index like this:
session.Query<Collection>("Raven/DocumentCollections")
.Where(x => x.Name == "MyDocument")
.FirstOrDefault();
Keep in mind, however, that indexes are updated asynchronously so after bulk-inserting a bunch of documents, the index may be stale. You can force it to wait by adding .Customize(x => x.WaitForNonStaleResults()) right after the .Query(...).
Raven Studio actually gets this data from the index Raven/DocumentsByEntityName which exists for every database, by sidestepping normal queries and getting metadata on the index. You can emulate that like this:
QueryResult result = docStore.DatabaseCommands.Query("Raven/DocumentsByEntityName",
new Raven.Abstractions.Data.IndexQuery
{
Query = "Tag:MyDocument",
PageSize = 0
},
includes: null,
metadataOnly: true);
var totalDocsOfType = result.TotalResults;
That QueryResult contains a lot of useful data:
{
Results: [ ],
Includes: [ ],
IsStale: false,
IndexTimestamp: "2013-11-08T15:51:25.6463491Z",
TotalResults: 3,
SkippedResults: 0,
IndexName: "Raven/DocumentsByEntityName",
IndexEtag: "01000000-0000-0040-0000-00000000000B",
ResultEtag: "BA222B85-627A-FABE-DC7C-3CBC968124DE",
Highlightings: { },
NonAuthoritativeInformation: false,
LastQueryTime: "2014-02-06T18:12:56.1990451Z",
DurationMilliseconds: 1
}
A lot of that is the same data you get on any query if you request statistics, like this:
RavenQueryStatistics stats;
Session.Query<Course>()
.Statistics(out stats)
// Rest of query