RavenDB poor select performance - ravendb

I'm testing RavenDB for my future projects. Database performance is a must requirement for me, that's why I want to be able to tune RavenDB to be at least in SQL Server's performance range, but my tests shows that raven db is approximately 10x-20x slower in select queries than SQL Server, even when RavenDB is indexed and SQL Server doesn't have any indexes.
I populated database with 150k documents. Each document has a collection of child elements. Db size is approx. 1GB and so is index size too. Raven/Esent/CacheSizeMax is set to 2048 and Raven/Esent/MaxVerPages is set to 128.
Here's how the documents looks like:
{
"Date": "2028-09-29T01:27:13.7981628",
"Items": [
{
{
"ProductId": "products/673",
"Quantity": 26,
"Price": {
"Amount": 2443.0,
"Currency": "USD"
}
},
{
"ProductId": "products/649",
"Quantity": 10,
"Price": {
"Amount": 1642.0,
"Currency": "USD"
}
}
],
"CustomerId": "customers/10"
}
public class Order
{
public DateTime Date { get; set; }
public IList<OrderItem> Items { get; set; }
public string CustomerId { get; set; }
}
public class OrderItem
{
public string ProductId { get; set; }
public int Quantity { get; set; }
public Price Price { get; set; }
}
public class Price
{
public decimal Amount { get; set; }
public string Currency { get; set; }
}
Here's the defined index:
from doc in docs.Orders
from docItemsItem in ((IEnumerable<dynamic>)doc.Items).DefaultIfEmpty()
select new { Items_Price_Amount = docItemsItem.Price.Amount, Items_Quantity = docItemsItem.Quantity, Date = doc.Date }
I defined the index using Management studio, not from code BTW (don't know if it has any negative/positive effect on perfromance).
This query takes from 500ms to 1500ms to complete (Note that this is the time that is needed to execute the query, directly shown from ravendb's console. So it doesn't contain http request time and deserialization overhead. Just query execution time).
session.Query<Order>("OrdersIndex").Where(o =>
o.Items.Any(oi => oi.Price.Amount > 0 && oi.Quantity < 100)).Take(128).ToList();
I'm running the query on quad core i5 cpu running at 4.2 GHz and the db is located on a SSD.
Now when I populated same amount of data on sql server express, with same schema and same amount of associated objects. without index, sql server executes the same query which includes joins in 35ms. With index it takes 0ms :|.
All tests were performed when db servers were warmed up.
Though, I'm still very satisfied with RavenDB's performance, I'm curious if I am missing something or RavenDB is slower than a relational database?
Sorry for my poor english.
Thanks
UPDATE
Ayande, I tried what you suggested, but when I try to define the index you sent me, I get the following error:
public Index_OrdersIndex()
{
this.ViewText = #"from doc in docs.Orders
select new { Items_Price_Amount = doc.Items(s=>s.Price.Amount), Items_Quantity = doc.Items(s=>s.Quantity), Date = doc.Date }
";
this.ForEntityNames.Add("Orders");
this.AddMapDefinition(docs => from doc in docs
where doc["#metadata"]["Raven-Entity-Name"] == "Orders"
select new { Items_Price_Amount = doc.Items(s => s.Price.Amount), Items_Quantity = doc.Items.(s => s.Quantity), Date = doc.Date, __document_id = doc.__document_id });
this.AddField("Items_Price_Amount");
this.AddField("Items_Quantity");
this.AddField("Date");
this.AddField("__document_id");
this.AddQueryParameterForMap("Date");
this.AddQueryParameterForMap("__document_id");
this.AddQueryParameterForReduce("Date");
this.AddQueryParameterForReduce("__document_id");
}
}
error CS1977: Cannot use a lambda expression as an argument to a dynamically dispatched operation without first casting it to a delegate or expression tree type

Davita,
The following index generate ~8 million index entries:
from doc in docs.Orders
from docItemsItem in ((IEnumerable<dynamic>)doc.Items).DefaultIfEmpty()
select new { Items_Price_Amount = docItemsItem.Price.Amount, Items_Quantity = docItemsItem.Quantity, Date = doc.Date }
This one generates far less:
from doc in docs.Orders
select new { Items_Price_Amount = doc.Items(s=>s.Price.Amount), Items_Quantity = doc.Items.(s=>s.Quantity), Date = doc.Date }
And can be queried with the same results, but on our tests showed up to be about twice as fast.
The major problem is that you are making several range queries, which are expensive with a large number of potential values, and then you have a large number of actual matches for the query.
Doing an exact match is significantly faster, by the way.
We are still working on ways to try to speed things up.

Related

How can i improve the execution time of my CouchDB query?

I am storing a simple class consisting of the following data in my CouchDB. The Definition class just contains a list of points and additional basic data.
public class Geometry : CouchDocument
{
public Guid SyncId { get; set; }
public DateTimeOffset CreatedOn { get; set; }
public Definition Definition { get; set; }
}
The SyncId in this example, is a unique id, which i use to identify geometries within different micro services of my software. So i use it as primary key for the documents.
I created an index like this:
{
"index": {
"fields": [
"syncId"
]
},
"name": "sync-id-index",
"type": "json"
}
When i now try to run a query on the CouchDB using the $In-operator or even just doing a syncid=X1 OR syncid=X2 etc, it uses the index i created. However, it takes 16 seconds for the query to finish. If i delete the index, it takes only 4 seconds.
{
"selector": {
"syncId": {
"$in": [
"ca7be6e4-dc11-4ddf-99f3-c97f544bf998",
"716726b9-5493-498c-b207-d4b7e63f1ef3",
"cb6c4941-7b33-445b-8988-361930f9b39a",
"564fc2d5-3713-4b2b-b2e5-7dd79ef4509c",
"6c9845e3-39fa-4a3f-acb7-86a362665a13",
"15bb9836-3bd1-42b3-b12c-5a1025490d20",
"a0e15e75-292f-4c76-959f-8adc5e569a31",
"39b056bf-4ff9-4ada-9a44-9552801b52c4",
"20d9e3bf-3e32-4426-850a-86422771897a",
"9f262c8c-e493-4bec-9871-ed612a698a8c"
]
}
}
}
How can i improve the index or this query to improve the performance and lower the execution time?
So i use it as primary key for the documents.
If syncId is your primary key, consider making it the _id field in CouchDB. That would be by far the most efficient way to query the documents. You can then post to the _all_docs endpoint and specify which keys you want returned, which will be very efficient. Remember to also set "include_docs": true to get the actual documents and not only the revisions.
Something like this:
POST /geometry/_all_docs HTTP/1.1
Accept: application/json
Content-Type: application/json
Host: localhost:5984
{
"include_docs": true,
"keys" : [
"ca7be6e4-dc11-4ddf-99f3-c97f544bf998",
"716726b9-5493-498c-b207-d4b7e63f1ef3",
"cb6c4941-7b33-445b-8988-361930f9b39a",
"564fc2d5-3713-4b2b-b2e5-7dd79ef4509c",
"6c9845e3-39fa-4a3f-acb7-86a362665a13",
"15bb9836-3bd1-42b3-b12c-5a1025490d20",
"a0e15e75-292f-4c76-959f-8adc5e569a31",
"39b056bf-4ff9-4ada-9a44-9552801b52c4",
"20d9e3bf-3e32-4426-850a-86422771897a",
"9f262c8c-e493-4bec-9871-ed612a698a8c"
]
}
Some more information on _all_docs

MR index calculating average with denormalized properties

I have a C# service that responds to clients periodically requesting an array of actions to perform and each action is stored in a RavenDB Action document with these properties where the last 2 properties are denormalized for performance:
Id (String - PK)
ClientId (String - FK)
RequestId (String - Unique Id for the request. We don't store a Request entity)
RequestDateTime (Date & Time - Date & time that request was made)
RequestDuration (Time Span - How long the request took to determine the list of actions)
I want to create an MR index that provides hourly request statistics per client so that I can see statistics for Client #1, 01/02/22 09:00-10:00 etc. I'm struggling to calculate AvgRequestDuration because the group contains duplicate RequestDuration(s) due to the data being denormalized. Obviously min & max are not affected with duplicates.
public class Result
{
public string ClientId { get; set; }
public DateTime PeriodStart { get; set; }
public TimeSpan MinRequestDuration { get; set; }
public TimeSpan MaxRequestDuration { get; set; }
public TimeSpan AvgRequestDuration { get; set; }
}
public ClientStatsByPeriodStartDateTime()
{
Map = action => from ac in actions
let period = TimeSpan.FromHours(1)
select new
{
ClientId = ac.ClientId,
PeriodStart = new DateTime(((ac.RequestDateTime.Ticks + period.Ticks - 1) / period.Ticks) * period.Ticks, DateTimeKind.Utc),
ac.RequestDuration
};
Reduce = results => from result in results
group result by new
{
result.ClientId,
result.PeriodStart
}
into agg
select new
{
ClientId = agg.Key.ClientId,
PeriodStart = agg.Key.PeriodStart,
AvgRequestDuration = agg.Avg(x => x.RequestDuration), // This is wrong
MinRequestDuration = agg.Min(x => x.RequestDuration),
MaxRequestDuration = agg.Max(x => x.RequestDuration)
};
}
Consider using the timeSeries feature to calculate avg, min & max.
Create a time series entry for each request.
The entry value can hold the duration.
You can then query for data at specific times, and get min,max,avg info for the values.
You can even index time series data.
This blog post can also be useful to start.
I've decided to normalize the structure and have a single document named Request that contains an array of the Action entity. The Duration property can then be stored against the Request document.

insert huge data with duplicate using entity framework

I'm using .NET Core 2.0 with Entity Framework Code First. I have a table "HData" with these few attributes: Id(PK), Value(Index with ignoring duplicate key setting) and Time. This table may have 10,000,000+ entries.
I insert thousand of entries like this:
dataList.ForEach(i => db.HData.Add(new HData() { Value = i, Time = some.CreateDateTime }));
db.SaveChanges();
I've already set the IGNORE_DUP_KEY = ON with SQL command, but when the code run until above db.SaveChanges, it shows me the duplicate error message as:
Cannot insert duplicate key row in object 'dbo.HData' with unique index 'IX_HData_Value'. The duplicate key value is (abcde).
How can I solve it or anyway can catch this exception and skip to next data insert?
Update HData Models Information
public class HData
{
public int Id { get; set; }
public string Value { get; set; }
public DateTime Time { get; set; }
}
OnModelCreating:
builder.Entity<HData>(b => {
b.Property(p => p.Time).HasDefaultValueSql("GETUTCDATE()");
b.HasIndex(p => p.Value).IsUnique(true).HasFilter(null);});
3rd Update:
I set the IGNORE_DUP_KEY = ON in EnsureSeedData function, the code below
context.Database.ExecuteSqlCommand("ALTER INDEX [IX_HData_Value] ON [dbo].[HData] REBUILD WITH (IGNORE_DUP_KEY = ON)");
context.SaveChanges();
You can create a loop with the condition until the count of all records.
and use try-catch.
Sample code :
var data= datalist.count();
for(conditons)
{
SaveRecord(datalist); // here just pass one data which you have to save
}
function SaveRecord(datalist)
{
try{
// Write your save code here.
}
catch(Exception ex){
// don't throw exception.
}
}
If you only want the unique values inserted, you can do it this was
dataList.Distinct(l => l.value).ForEach(i => db.HData.Add(new HData() { Value = i, Time = some.CreateDateTime }));

Ravendb Search with OrderBy not working

Im using the latest build of RavenDB (3.0.3800)
When I run a simple query with a Search and Orderby the Search is ignored. If I remove the OrderBy the Search works and returns the correct results
var query = _session.Query<Index_All.ReduceResult, Index_All>()
.Customize(x => x.WaitForNonStaleResults())
.Search(x => x.SearchTerm, "Some String")
.OrderBy(x => x.PublishDate);
This just returns all results, ignoring my Search completely.
Here is my Index:
public class Index_All : AbstractIndexCreationTask<MyDocuemnt,Index_All.ReduceResult>
{
// query model
public class ReduceResult
{
public string SearchTerm { get; set; }
public DateTimeOffset PublishDate { get; set; }
}
public Index_All()
{
Map = documents => from d in documents
let customer = LoadDocument<Customer>(d.Customer.Id)
let owner = LoadDocument<Customer>(d.Owner.Id)
select new
{
SearchQuery = new object[]
{
customer.Name,
owner.Name,
},
d.PublishDate,
};
Index(x => x.SearchTerm, FieldIndexing.Analyzed);
}
}
I have no idea why this is happening, the only work around i have is to return the result unordered. Can anyone spot what the problem is here ?
Thanks
You probably don't want the orderby to work. The result of a [full text] Search is going to be ordered by Lucene score. That's going to give you the best matches for the search terms provided by the user. Given that, ordering by publish date would "ruin" the quality of the results.
However, I just tried this with v30k, and it appears to use the order by properly after filtering using Search.
Edit - I notice you're using "SearchTerm" for the query model and the analyze expression, but you're indexing "SearchQuery". Make those the same and it should work.

RavenDb: Join data with index

In my database I have a list of cases:
{ Id: 1, Owner: "guid1", Text: "Question1" }
{ Id: 2, Owner: "guid1", Text: "Question2" }
{ Id: 3, Owner: "guid2", Text: "Question3" }
When querying for data I would also like to have (in my index, result) number of cases each Owner has. So I created a map/reduce index on this collection:
public class RelatedCases
{
public Guid Owner { get; set; }
public int Count { get; set; }
}
public class RelatedCaseIndex : AbstractMultiMapIndexCreationTask<RelatedCases>
{
public RelatedCaseIndex()
{
AddMap<CaseDocument> (c => c.Select(a => new { a.Owner, Count = 1 }));
Reduce = result => result
.GroupBy(a => a.Owner)
.Select(a => new
{
Owner = a.Key,
Count = a.Sum(b => b.Count)
});
}
}
Now I just have no idea how to produce a query to include the data from the index. Based on documentation I tried something like:
session.Query<CaseDocument>().Customize(a => a.Include ...)
or TransformResults on a CaseIndex, which didn't work out properly.
I know I could just requery raven to get me list of all RelatedCases in a separate query, but I would like to do it in one round-trip.
You can't query for Cases and join the result with the map/reduce index on the fly. That's just not how it works, because every query will run against an index, so what you are really asking is joining two indexes. This is something you need to do upfront.
In other words - put all the information you want to query upon into your map/reduce index. You can then run the query on this index and .Include() the documents that you are also interested in.
I dont think you need a MultiMap index, a simple MapReduce index will suffice for this.
You can then query it like so:
session.Query<RelatedCases, RelatedCaseIndex>();
This will bring back a list of RelatedCases with the owner and count.