How can i improve the execution time of my CouchDB query? - indexing

I am storing a simple class consisting of the following data in my CouchDB. The Definition class just contains a list of points and additional basic data.
public class Geometry : CouchDocument
{
public Guid SyncId { get; set; }
public DateTimeOffset CreatedOn { get; set; }
public Definition Definition { get; set; }
}
The SyncId in this example, is a unique id, which i use to identify geometries within different micro services of my software. So i use it as primary key for the documents.
I created an index like this:
{
"index": {
"fields": [
"syncId"
]
},
"name": "sync-id-index",
"type": "json"
}
When i now try to run a query on the CouchDB using the $In-operator or even just doing a syncid=X1 OR syncid=X2 etc, it uses the index i created. However, it takes 16 seconds for the query to finish. If i delete the index, it takes only 4 seconds.
{
"selector": {
"syncId": {
"$in": [
"ca7be6e4-dc11-4ddf-99f3-c97f544bf998",
"716726b9-5493-498c-b207-d4b7e63f1ef3",
"cb6c4941-7b33-445b-8988-361930f9b39a",
"564fc2d5-3713-4b2b-b2e5-7dd79ef4509c",
"6c9845e3-39fa-4a3f-acb7-86a362665a13",
"15bb9836-3bd1-42b3-b12c-5a1025490d20",
"a0e15e75-292f-4c76-959f-8adc5e569a31",
"39b056bf-4ff9-4ada-9a44-9552801b52c4",
"20d9e3bf-3e32-4426-850a-86422771897a",
"9f262c8c-e493-4bec-9871-ed612a698a8c"
]
}
}
}
How can i improve the index or this query to improve the performance and lower the execution time?

So i use it as primary key for the documents.
If syncId is your primary key, consider making it the _id field in CouchDB. That would be by far the most efficient way to query the documents. You can then post to the _all_docs endpoint and specify which keys you want returned, which will be very efficient. Remember to also set "include_docs": true to get the actual documents and not only the revisions.
Something like this:
POST /geometry/_all_docs HTTP/1.1
Accept: application/json
Content-Type: application/json
Host: localhost:5984
{
"include_docs": true,
"keys" : [
"ca7be6e4-dc11-4ddf-99f3-c97f544bf998",
"716726b9-5493-498c-b207-d4b7e63f1ef3",
"cb6c4941-7b33-445b-8988-361930f9b39a",
"564fc2d5-3713-4b2b-b2e5-7dd79ef4509c",
"6c9845e3-39fa-4a3f-acb7-86a362665a13",
"15bb9836-3bd1-42b3-b12c-5a1025490d20",
"a0e15e75-292f-4c76-959f-8adc5e569a31",
"39b056bf-4ff9-4ada-9a44-9552801b52c4",
"20d9e3bf-3e32-4426-850a-86422771897a",
"9f262c8c-e493-4bec-9871-ed612a698a8c"
]
}
Some more information on _all_docs

Related

How to change JSON returned by query using Helidon 2.0.0-M-2

I'm using Helidon 2.0.0-M2.
When I run the query below I get back a list of JSON objects.
dbClient.execute(exec -> exec.createNamedQuery("select-dsitem-by-id")
.addParam("userId", dataItemId)
.execute())
.thenAccept(response::send)
.exceptionally(throwable -> sendError(throwable, response));
Returned list
[
{
"data": "qwerty",
"user_id": "12345"
},
{
"data": "qwerty123",
"user_id": "22345"
}
]
The attribute names seem to be taken directly from the database column name. e.g. one attribute name returned is "user_id". However, I want it to be "userId". I also want to create a parent wrapper for this list like:
{
"userList": [
{
"data": "qwerty",
"user_id": "12345"
},
{
"data": "qwerty123",
"user_id": "22345"
}
]
}
What is the best way to do this with the dbclient?
Thanks
Simple approach:
Change your SQL statement to return the correct name, such as:
SELECT data, user_id as userId FROM mytable
Complicated approach:
We are working on a better support to map to a JSON stream.
Currently there is only one (a bit complicated) way to achieve this:
You can create a custom mapper from a DbRow to JsonObject. This mapper needs to be a general one (it must work for any DbRow of any query).
The built-in mapper uses metadata provided on the columns. I have prepared a simple example (that just expects to have a single type of statements):
class DbRecordMapperProvider implements DbMapperProvider {
private static final DbMapper<JsonObject> MAPPER = new DbRecordMapper();
#SuppressWarnings("unchecked")
#Override
public <T> Optional<DbMapper<T>> mapper(Class<T> aClass) {
if (JsonObject.class.equals(aClass)) {
return Optional.of((DbMapper<T>)MAPPER);
}
return Optional.empty();
}
}
class DbRecordMapper implements DbMapper<JsonObject> {
#Override
public JsonObject read(DbRow dbRow) {
return Json.createObjectBuilder()
.add("name", dbRow.column("FIRSTPART").as(String.class))
.add("message", dbRow.column("SECONDPART").as(String.class))
.build();
}
#Override
public Map<String, ?> toNamedParameters(JsonObject dbRecord) {
return dbRecord;
}
#Override
public List<?> toIndexedParameters(JsonObject dbRecord) {
throw new IllegalStateException("Cannot convert json object to indexed parameters");
}
}
The important method is public JsonObject read(DbRow dbRow).
Once you have such a DbMapperProvider, you register it with the DbClient:
dbClient = DbClient.builder()
.config(config.get("db"))
.mapperProvider(new DbRecordMapperProvider())
.build();

Stop all further validation if multiple validation for first field fails

I am trying to write the rules for validating field for the validation.
At first i want to check if Location Id is empty, null or if that exist in the database, and then only processed further for other validation.
But, with the code i used, only check if empty, it does not stop if Location Id does not exist in the database.
Register Model
public class RegisterModel
{
public long LocationId {get;set;}
public long FirstName {get;set;}
public long LastName {get;set;}
//...other property to be added
}
JSON i am passing to API:
{
"FirstName": "John",
"LocationId": "1234534",
}
The location id does not exist in the database: But i am getting the response as:
{
"Errors": [
{
"FieldName": "LastName",
"Message": "'Last Name' must not be empty."
},
{
"FieldName": "LocationId",
"Message": "Invalid request."
}
]
}
Validation rule i am using:
public class RegisterModelValidator : AbstractValidator<RegisterModel>
{
private readonly DbContext _context;
public RegisterModelValidator(DbContext context)
{
this._context = context;
this.CascadeMode = CascadeMode.StopOnFirstFailure;
RuleFor(req => req.LocationId)
.NotEmpty().WithMessage("param1 is missing.")
.Must(IsValidRequest).WithMessage("Invalid request.");
When(x => x.LocationId != null, () => {
RuleFor(x => x.FirstName).Cascade(CascadeMode.StopOnFirstFailure).NotNull().NotEmpty();
RuleFor(x => x.LastName).Cascade(CascadeMode.StopOnFirstFailure).NotNull().NotEmpty();
});
}
private bool IsValidRequest(string req)
{
var locationId = long.TryParse(req, out long result) ? result : 0;
return _context.Locations.Any(x => x.LocationExtId == locationId);
}
private bool BeAValidDate(string value)
{
DateTime date;
return DateTime.TryParse(value, out date);
}
}
In my condition what i want is if the location id is missing or does not exist in the database, the validation should stop immediately, it should not check for the other field.
If you look at the docs here, its mentioned as
Setting the cascade mode only applies to validators within the same RuleFor chain. Changing the cascade mode does not affect separate calls to RuleFor. If you want prevent one rule from running if a different rule fails, you should instead use Dependent Rules (below)
So can you try it like this using Dependent Rules
RuleFor(x => x.FirstName).NotNull().NotEmpty()
.DependentRules(d => d.RuleFor(req => req.LocationId)
.NotEmpty().WithMessage("param1 is missing.")
.Must(IsValidRequest).WithMessage("Invalid request."));
But i see if we use it like this, i think it should be repeated for multiple properties.
or a better option for you is using PreValidate since
At first i want to check if Location Id is empty, null or if that exist in the database
Prevalidate a property

Is there a way to escape colon in appsetting.json dictionary key in aspnetcore configuration?

I have this provider dictionary in appsetting.json
"AppSettings": {
"Providers": {
"http://localhost:5001": "Provider1",
"http://localhost:5002": "Provider2"
},
"ArrayWorks": [
"http://localhost:5001",
"http://localhost:5002"
],
"SoDoesColonInDictionaryValue": {
"Provider1": "http://localhost:5001",
"Provider2": "http://localhost:5002"
}
}
And the following throw exception because there's colon in the dictionary key.
Configuration.GetSection("AppSettings").Get<AppSettings>()
However, colon works fine as dictionary value, or array, just not dictionary key.
I read colon has special meaning in config, but there seems no way to escape. Why?
Edit:
public class AppSettings
{
public string ApplicationName { get; set; }
public IDictionary<string, string> Providers { get; set; }
}
When debugging Configuration.GetSection("AppSettings"), you get this
Key AppSettings:Providers:http://localhost:5000
Value Provider1
It was intended to be something like this
Key AppSettings:Providers:http_//localhost_5000
But there seems no way to control how Configuration treat the :::
Edit:
According to aspnet/Configuration#792
Colons are reserved for special meaning in the keys, so they shouldn't
be used as part of normal key values.
This isn't supported and issue was closed.
Not yet, Until now there is no escape colon character, Accourding to Microsoft Asp.net repository on github, but there is an open issue with #782 on the github repository which move it to this backlog
As a workaround you can reverse the key with the value in appsetting:AppSettings and correct it in code like the below:
"AppSettings": {
"Providers": {
"Provider1":"http://localhost:5001",
"Provider2":"http://localhost:5002"
},
"ArrayWorks": [
"http://localhost:5001",
"http://localhost:5002"
],
"SoDoesColonInDictionaryValue": {
"Provider1": "http://localhost:5001",
"Provider2": "http://localhost:5002"
}
}
And in code make sure to reverse dictionary key and value as the below
var result = _configuration.GetSection("AppSettings:Providers")
.GetChildren().ToDictionary(i=>i.Value,i=>i.Key);
// result["http://localhost:5001"] return Provider1
// result["http://localhost:5002"] return Provider2

RavenDB poor select performance

I'm testing RavenDB for my future projects. Database performance is a must requirement for me, that's why I want to be able to tune RavenDB to be at least in SQL Server's performance range, but my tests shows that raven db is approximately 10x-20x slower in select queries than SQL Server, even when RavenDB is indexed and SQL Server doesn't have any indexes.
I populated database with 150k documents. Each document has a collection of child elements. Db size is approx. 1GB and so is index size too. Raven/Esent/CacheSizeMax is set to 2048 and Raven/Esent/MaxVerPages is set to 128.
Here's how the documents looks like:
{
"Date": "2028-09-29T01:27:13.7981628",
"Items": [
{
{
"ProductId": "products/673",
"Quantity": 26,
"Price": {
"Amount": 2443.0,
"Currency": "USD"
}
},
{
"ProductId": "products/649",
"Quantity": 10,
"Price": {
"Amount": 1642.0,
"Currency": "USD"
}
}
],
"CustomerId": "customers/10"
}
public class Order
{
public DateTime Date { get; set; }
public IList<OrderItem> Items { get; set; }
public string CustomerId { get; set; }
}
public class OrderItem
{
public string ProductId { get; set; }
public int Quantity { get; set; }
public Price Price { get; set; }
}
public class Price
{
public decimal Amount { get; set; }
public string Currency { get; set; }
}
Here's the defined index:
from doc in docs.Orders
from docItemsItem in ((IEnumerable<dynamic>)doc.Items).DefaultIfEmpty()
select new { Items_Price_Amount = docItemsItem.Price.Amount, Items_Quantity = docItemsItem.Quantity, Date = doc.Date }
I defined the index using Management studio, not from code BTW (don't know if it has any negative/positive effect on perfromance).
This query takes from 500ms to 1500ms to complete (Note that this is the time that is needed to execute the query, directly shown from ravendb's console. So it doesn't contain http request time and deserialization overhead. Just query execution time).
session.Query<Order>("OrdersIndex").Where(o =>
o.Items.Any(oi => oi.Price.Amount > 0 && oi.Quantity < 100)).Take(128).ToList();
I'm running the query on quad core i5 cpu running at 4.2 GHz and the db is located on a SSD.
Now when I populated same amount of data on sql server express, with same schema and same amount of associated objects. without index, sql server executes the same query which includes joins in 35ms. With index it takes 0ms :|.
All tests were performed when db servers were warmed up.
Though, I'm still very satisfied with RavenDB's performance, I'm curious if I am missing something or RavenDB is slower than a relational database?
Sorry for my poor english.
Thanks
UPDATE
Ayande, I tried what you suggested, but when I try to define the index you sent me, I get the following error:
public Index_OrdersIndex()
{
this.ViewText = #"from doc in docs.Orders
select new { Items_Price_Amount = doc.Items(s=>s.Price.Amount), Items_Quantity = doc.Items(s=>s.Quantity), Date = doc.Date }
";
this.ForEntityNames.Add("Orders");
this.AddMapDefinition(docs => from doc in docs
where doc["#metadata"]["Raven-Entity-Name"] == "Orders"
select new { Items_Price_Amount = doc.Items(s => s.Price.Amount), Items_Quantity = doc.Items.(s => s.Quantity), Date = doc.Date, __document_id = doc.__document_id });
this.AddField("Items_Price_Amount");
this.AddField("Items_Quantity");
this.AddField("Date");
this.AddField("__document_id");
this.AddQueryParameterForMap("Date");
this.AddQueryParameterForMap("__document_id");
this.AddQueryParameterForReduce("Date");
this.AddQueryParameterForReduce("__document_id");
}
}
error CS1977: Cannot use a lambda expression as an argument to a dynamically dispatched operation without first casting it to a delegate or expression tree type
Davita,
The following index generate ~8 million index entries:
from doc in docs.Orders
from docItemsItem in ((IEnumerable<dynamic>)doc.Items).DefaultIfEmpty()
select new { Items_Price_Amount = docItemsItem.Price.Amount, Items_Quantity = docItemsItem.Quantity, Date = doc.Date }
This one generates far less:
from doc in docs.Orders
select new { Items_Price_Amount = doc.Items(s=>s.Price.Amount), Items_Quantity = doc.Items.(s=>s.Quantity), Date = doc.Date }
And can be queried with the same results, but on our tests showed up to be about twice as fast.
The major problem is that you are making several range queries, which are expensive with a large number of potential values, and then you have a large number of actual matches for the query.
Doing an exact match is significantly faster, by the way.
We are still working on ways to try to speed things up.

Getting NHibernate to store unique object only once

I have got four tables in NHibernate
Job
{
DBID { int, primary key }
Name { string }
Media { fk_media (int) }
PublishInfo { fk_publishinfo (int) }
}
Media
{
DBID { int, primary key }
TransmissionID { string }
Series { string }
Title { string }
Description { string }
}
PublishInfo
{
DBID { int, primary key }
Destination { string }
AudioTrack_1 { fk_audiolanguage }
AudioTrack_2 { fk_audiolanguage }
AudioTrack_3 { fk_audiolanguage }
AudioTrack_4 { fk_audiolanguage }
}
AudioLanguage
{
Code { string, primary key }
Description { string }
}
What I want to achive with NHibernate is that it only stores unique records. So if a Media is used multiple times they all point to the same media entry. Same goes for PublishInfo.
A second issue I ran into is that when using the same audiolanguage for audiotrack_3 and 4 for instance it gives a error that is was already used in the session.
Any tips how I would do this properly?
This is a partial anwser to my own question. The reason it wasn't working correctly for the languages when using same lanuage on multiple tracks is because they got send back and forth between a client and server application. The way I solved this is by receiving it back to the server I lookup the proper objects stored in a list on the server and replace them.
I do not think this is the proper solution but it works.