Map/Reduce over sharded data with RavenDB

Map/Reduce over sharded data with RavenDB - ravendb

I'm having trouble getting a map reduce sample to work when the data is sharded across two nodes. I'm storing documents that relate to application errors being logged on two local ravenDB nodes, the error documents look like:
Example of document on node 1, there are 6 total
errors/1/6
{
"UniqueId": "c62c7e30-8ec7-45af-88e4-da023d796727",
"ApplicationName": "MyAppName"
}
Example of document on node 2, there are 7 total
errors/2/6 --Error stored on shard node 2
{
"UniqueId": "7e0b0f87-9d75-4e70-9fa0-d64a18bc88dc",
"ApplicationName": "MyAppName"
}
when I run this query:
public class ApplicationNames : AbstractIndexCreationTask<ErrorDocument, Application>
{
public ApplicationNames()
{
Map = errors => from error in errors
select new { error.ApplicationName, Count = 1 };
Reduce = results => from error in results
group error by new { error.ApplicationName, error.Count } into g
select new { g.Key.ApplicationName, Count = g.Sum(x=> x.Count) };
}
}
I'm getting back 2 results; one with a Count of 6, the second with a Count of 7. I was expecting that the two results from each shard would be combined into one result with a count of 13. Not sure if I'm doing something wrong or if that's not how its supposed to work. I followed the example at http://ravendb.net/documentation/docs-sharding to set up the sharding strategy.

Grant,
RavenDB currently doesn't handle reduce over multiple nodes.
You can do that yourself using:
session.Query<Application, ApplicationNames>()
.ToList()
.Select(new ApplicationNames().Reduce)
.ToList();

Related

Multiple MySQL queries returning undefined when outputting value

I am running two database queries to retrieve data that I will outputting in a message embed. The queries are returning the proper rows when I just dump the entire result into the console. However, whenever I try to output the actual value for one of the rows, it displays as undefined in the message embed.
From what I've found based on examples, rows[0].somevalue should be outputting the correct results.
let mentionedUser = message.mentions.members.first();
let captainUser = client.users.find(user => user.id == `${mentionedUser.id}`);
con.query(`SELECT * FROM captains WHERE id = '${mentionedUser.id}';SELECT * FROM results WHERE captain = '${captainUser.username}'`, [2, 1], (err, rows) => {
if(err) throw err;
console.log(rows);
const infoEmbed = new Discord.RichEmbed()
.setColor("#1b56af")
.setAuthor('Captain Information', client.user.displayAvatarURL)
.setThumbnail('https://i.imgur.com/t3WuKqf.jpg')
.addField('Captain Name', `${mentionedUser}`, true)
.addField('Cap Space', `${rows[0].credits}`, true) // Returns undefined
message.channel.send(infoEmbed);
});
This is the console result
[ [ RowDataPacket {
id: '91580646270439424',
team_name: 'Resistance',
credits: 85,
roster_size: 2 } ],
[ RowDataPacket { id: 'Sniper0270', captain: 'BTW8892', credits: 10 },
RowDataPacket { id: 'Annex Chrispy', captain: 'BTW8892', credits: 5 } ] ]
In the code posted above, the expected output of rows[0].credits should output 85. No error codes are present, it just displayed as "undefined" in the message embed.

You are executing two queries inside a single query call. It looks like the mysql library returns an array of arrays in this scenario where the first value is the result of the first query and the second is the result of the second query. This is non standard. Normally you would either execute each query in its own query call or you would use a union to join the two queries into a single resultset.

this is not the practical way to send query request , as query is a single statement excluding the bulk update , you cannot execute two different query using a single con.query , it is not a proper way. execute them separately

Sequelize Querying with Op.or and Op.ne with same array of numbers

I'm having trouble getting the correct query with sequelize.
I have an array representing ids of entries lets say its like this -
userVacationsIds = [1,2,3]
i made the first query like this
Vacation.findAll({
where: {
id: {
[Op.or]: userVacationsIds
}
}
})
.then(vacationSpec => {
Vacation.findAll({
where:{
//Here i need to get all entries that DONT have the ids from the array
}
}
})
I can't get the correct query as specified in my code "comment"
I've tried referring to sequelize documentation but i can't understand how to chain these queries specifically
Also tried an online converter but that failed too.
Specified the code i have above
So i just need some help getting this query correct please.
I eventually expect to get 2 arrays - one containing all entries with the ids from the array, the other containing everything else (as in id is NOT in the array)

I figured it out.
I feel silly.
This is the query that worked
Vacation.findAll({
where: {
id: {
[Op.or]: userVacationsIds
}
}
}).then(vacationSpec => {
Vacation.findAll({
where: {
id: {
[Op.notIn]: userVacationsIds
}
}
})

Counting cascaded distincts in RavenDB

I'm facing an index problem for which I can't see a solution yet.
I have the following document structure per board:
{
"Name": "Test Board",
...
"Settings": {
"Admins": [ "USER1", "USER2" ],
"Members": [ "USER3", "USER4", "USER5" ]
...
},
...
"CreatedBy": "USER1",
"CreatedOn": "2014-09-26T18:14:20.0858945"
...
}
Now I'd like to be able to retrieve the count of all users which are somewhere registered in a board. Of course this should not only count the number of user occurences but rather count the number of distinct users. One user can be member of multiple boards.
This operation should perform as fast as possible since it is displayed in a global statistics dashboard visible on each page. Therefor I chose to try it with an index instead of retrieving all boards and their users and do the work on client side.
Trying to achieve this by using a Map/Reduce index:
Map = boards => from board in boards
select new
{
Aggregation = "ALL",
Users = new object[]
{
board.CreatedBy,
board.Settings.Admins,
board.Settings.Members
},
NumberOfUsers = 1
};
Reduce = results => from res in results
group res by new
{
res.Aggregation
}
into g
select new
{
g.Key.Aggregation,
Users = g.Select(x => x.Users),
NumberOfUsers = g.Sum(x => x.Users.Length)
};
Obviously this results in a wrong count. I don't have any experience with Reduce yet so I appreciate any tip! The solution will be probably pretty easy...
What would be the best way to globally distinct CreatedBy, Admins and Members of all documents and return the count?

Use an index like this:
from board in docs.Boards
select new
{
Users = board.Settings.Admins.Count + board.Settings.Members.Count + 1 /* created by */
}
from r in results
group r by "all" into g
select new
{
Users = g.Sum(x=>x.Users)
}

The best I could come up so far is:
Map = boards => from board in boards
select new
{
Users = new object[]
{
board.CreatedBy,
board.Settings.Admins,
board.Settings.Members
}
};
Reduce = results => from r in results
group r by "all" into g
select new
{
Users = g.SelectMany(x => x.Users)
};
And then query for the distinct user count:
var allUsersQuery = _documentSession.Query<AllUsersIndex.Result, AllUsersIndex>();
return allUsersQuery.Any() ? allUsersQuery.First().Users.Distinct().Count() : 0;
At least the query only returns a list of all usernames on all boards instead of bigger object trees. But the uniqueness still has to be done client-side.
If there is any better way please let me know. It would be beautiful to have only one integer returned from the server...

Then use this:
from board in docs.Boards
from user in board.Settings.Admins.Concat(board.Settings.Members).Concat(new[]{board.CreatedBy})
select new
{
User = user,
Count = 1
}
from r in results
group r by r.User into g
select new
{
User = g.Key,
Count = g.Sum(x=>x.Count)
}
I'm not really happy about the fanout, but this will give you all the discint users and the number of times they appear.
If you want just the number of distinct users, just get the total results from the index.

RavenDB Get document count after BulkInsertOperations

I am using RavenDB to bulk load some documents. Is there a way to get the count of documents loaded into the database?
For insert operations I am doing:
BulkInsertOperation _bulk = docStore.BulkInsert(null,
new BulkInsertOptions{ CheckForUpdates = true});
foreach(MyDocument myDoc in docCollection)
_bulk.Store(myDoc);
_bulk.Dispose();
And right after that I call the following:
session.Query<MyDocument>().Count();
but I always get a number which is less than the count I see in raven studio.

By default, the query you are doing limits to a sane number of results, part of RavenDB's promise to be safe by default and not stream back millions of records.
In order to get the number of a specific type of document in yoru database, you need a special map-reduce index whose job it is to track the counts for each document type. Because this type of index deals directly with document metadata, it's easier to define this in Raven Studio instead of trying to create it with code.
The source for that index is in this question but I'll copy it here:
// Index Name: Raven/DocumentCollections
// Map Query
from doc in docs
let Name = doc["#metadata"]["Raven-Entity-Name"]
where Name != null
select new { Name , Count = 1}
// Reduce Query
from result in results
group result by result.Name into g
select new { Name = g.Key, Count = g.Sum(x=>x.Count) }
Then to access it in your code you would need a class that mimics the structure of the anonymous type created by both the Map and Reduce queries:
public class Collection
{
public string Name { get; set; }
public int Count { get; set; }
}
Then, as Ayende notes in the answer to the previously linked question, you can get results from the index like this:
session.Query<Collection>("Raven/DocumentCollections")
.Where(x => x.Name == "MyDocument")
.FirstOrDefault();
Keep in mind, however, that indexes are updated asynchronously so after bulk-inserting a bunch of documents, the index may be stale. You can force it to wait by adding .Customize(x => x.WaitForNonStaleResults()) right after the .Query(...).
Raven Studio actually gets this data from the index Raven/DocumentsByEntityName which exists for every database, by sidestepping normal queries and getting metadata on the index. You can emulate that like this:
QueryResult result = docStore.DatabaseCommands.Query("Raven/DocumentsByEntityName",
new Raven.Abstractions.Data.IndexQuery
{
Query = "Tag:MyDocument",
PageSize = 0
},
includes: null,
metadataOnly: true);
var totalDocsOfType = result.TotalResults;
That QueryResult contains a lot of useful data:
{
Results: [ ],
Includes: [ ],
IsStale: false,
IndexTimestamp: "2013-11-08T15:51:25.6463491Z",
TotalResults: 3,
SkippedResults: 0,
IndexName: "Raven/DocumentsByEntityName",
IndexEtag: "01000000-0000-0040-0000-00000000000B",
ResultEtag: "BA222B85-627A-FABE-DC7C-3CBC968124DE",
Highlightings: { },
NonAuthoritativeInformation: false,
LastQueryTime: "2014-02-06T18:12:56.1990451Z",
DurationMilliseconds: 1
}
A lot of that is the same data you get on any query if you request statistics, like this:
RavenQueryStatistics stats;
Session.Query<Course>()
.Statistics(out stats)
// Rest of query

How to get more than 200 data from a data store

I would like to link TestCase TestFolder and TestSet.
To do so I start 2 WsapiDataStore queries one on TestFolder and one on TestSet.
Then I parse the data and get the matching TestCase.
Unfortunatly I have not found the way to get more than 200 elements for each queries or to
index the starting index of the queries.
The code I use for a WsapiDataStore query is
_GetTestSetStore : function(TestFolder, container) {
var TestSet_Store = Ext.create('Rally.data.WsapiDataStore', {
model : 'TestSet',
fetch : [ 'FormattedID', 'TestSet' ],
pageSize : PageSize,
autoLoad : true,
listeners : {
load : function(TestSet_Store, TestSet_Data, success) {
if (success) {
container._CleanStore(TestSet_Store, TestSet_Data, TestFolder, container);
} else {
alert('TestSet store query failed');
}
}
}
});
},
Could you help please

Rally.data.wsapi.Store has a config property limit.
You may set limit to Infinity, no quotes.
limit : Number
The total number or records to retrieve with the initial load Defaults to one page worth. To retrieve all records specify Infinity

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Map/Reduce over sharded data with RavenDB - ravendb

Grant, RavenDB currently doesn't handle reduce over multiple nodes. You can do that yourself using: session.Query<Application, ApplicationNames>() .ToList() .Select(new ApplicationNames().Reduce) .ToList();

Related

Multiple MySQL queries returning undefined when outputting value

Sequelize Querying with Op.or and Op.ne with same array of numbers

Counting cascaded distincts in RavenDB

RavenDB Get document count after BulkInsertOperations

How to get more than 200 data from a data store

Categories

Resources