Hibernate Criteria - Restricting Data Based on Field in One-to-Many Relationship - sql

I need some hibernate/SQL help, please. I'm trying to generate a report against an accounting database. A commission order can have multiple account entries against it.
class CommissionOrderDAO {
int id
String purchaseOrder
double bookedAmount
Date customerInvoicedDate
String state
static hasMany = [accountEntries: AccountEntryDAO]
SortedSet accountEntries
static mapping = {
version false
cache usage: 'read-only'
table 'commission_order'
id column:'id', type:'integer'
purchaseOrder column: 'externalId'
bookedAmount column: 'bookedAmount'
customerInvoicedDate column: 'customerInvoicedDate'
state column : 'state'
accountEntries sort : 'id', order : 'desc'
}
...
}
class AccountEntryDAO implements Comparable<AccountEntryDAO> {
int id
Date eventDate
CommissionOrderDAO commissionOrder
String entryType
String description
double remainingPotentialCommission
static belongsTo = [commissionOrder : CommissionOrderDAO]
static mapping = {
version false
cache usage: 'read-only'
table 'account_entry'
id column:'id', type:'integer'
eventDate column: 'eventDate'
commissionOrder column: 'commissionOrder'
entryType column: 'entryType'
description column: 'description'
remainingPotentialCommission formula : SQLFormulaUtils.AccountEntrySQL.REMAININGPOTENTIALCOMMISSION_FORMULA
}
....
}
The criteria for the report is that the commissionOrder.state==open and the commissionOrder.customerInvoicedDate is not null. And the account entries in the report should be between the startDate and the endDate and with remainingPotentialCommission > 0.
I'm looking to display information on the CommissionOrder mainly (and to display account entries on that commission order between the dates), but when I use the following projection:
def results = accountEntryCriteria.list {
projections {
like ("entryType", "comm%")
ge("eventDate", beginDate)
le("eventDate", endDate)
gt("remainingPotentialCommission", 0.0099d)
and {
commissionOrder {
eq("state", "open")
isNotNull("customerInvoicedDate")
}
}
}
order("id", "asc")
}
I get the correct accountEntries with the proper commissionOrders, but I'm going in backwards: I have loads of accountEntries which can reference the same commissionOrder. Aut when I look at the commissionOrders that I've retrieved, each one has ALL its accountEntries not just the accountEntries between the dates.
I then loop through the results, get the commissionOrder from the accountEntriesList, and remove accountEntries on that commissionOrder after the end date to get the "snapshot" in time that I need.
def getCommissionOrderListByRemainingPotentialCommissionFromResults(results, endDate) {
log.debug("begin getCommissionOrderListByRemainingPotentialCommissionFromResults")
int count = 0;
List<CommissionOrderDAO> commissionOrderList = new ArrayList<CommissionOrderDAO>()
if (results) {
CommissionOrderDAO[] commissionOrderArray = new CommissionOrderDAO[results?.size()];
Set<CommissionOrderDAO> coDuplicateCheck = new TreeSet<CommissionOrderDAO>()
for (ae in results) {
if (!coDuplicateCheck.contains(ae?.commissionOrder?.purchaseOrder) && ae?.remainingPotentialCommission > 0.0099d) {
CommissionOrderDAO co = ae?.commissionOrder
CommissionOrderDAO culledCO = removeAccountEntriesPastDate(co, endDate)
def lastAccountEntry = culledCO?.accountEntries?.last()
if (lastAccountEntry?.remainingPotentialCommission > 0.0099d) {
commissionOrderArray[count++] = culledCO
}
coDuplicateCheck.add(ae?.commissionOrder?.purchaseOrder)
}
}
log.debug("Count after clean is ${count}")
if (count > 0) {
commissionOrderList = Arrays.asList(ArrayUtils.subarray(commissionOrderArray, 0, count))
log.debug("commissionOrderList size = ${commissionOrderList?.size()}")
}
}
log.debug("end getCommissionOrderListByRemainingPotentialCommissionFromResults")
return commissionOrderList
}
Please don't think I'm under the impression that this isn't a Charlie Foxtrot. The query itself doesn't take very long, but the cull process takes over 35 minutes. Right now, it's "manageable" because I only have to run the report once a month.
I need to let the database handle this processing (I think), but I couldn't figure out how to manipulate hibernate to get the results I want. How can I change my criteria?

Try to narrow down the bottle neck of that process. If you have a lot of data, then maybe this check could be time expensive.
coDuplicateCheck.contains(ae?.commissionOrder?.purchaseOrder)
in Set contains have O(n) complexity. You can use i.e. Map to store keys that you would check and then search for "ae?.commissionOrder?.purchaseOrder" as key in the map.
The second thought is that maybe when you're getting ae?.commissionOrder?.purchaseOrder it is always loaded from db by lazy mechanism. Try to turn on query logging and check that you don't have dozens of queries inside this processing function.
Finally and again I would suggest to narrow down where is the most expensive part and time waste.
This plugin maybe helpful.

Related

LINQ or Navigation Properties command to retrieve 1 to many data

I am looking for help with a LINQ SQL query please.
I have a blazor application that gets data from an Azure SQL database. I am seeking to get a dataset from the database for linking to a datagrid, where each row is a record from the main table joined with a record from the second table. The second table has millions of records, it needs to join one record which has the same key (securityId) and with the date being the record with the nominated date, or with the last date before the nominated date.
Because of the size of the 2nd file, I need an efficient query. Currently I am using the following, but I believe there must be more efficient ways to do it without the lag. Also tried Navigation Properties but couldn't get to work
reviewdateS is the date that I want the 2nd record to match or be the latest date prior to that date
result = (from cmpn in _ctx.MstarCompanies
join prcs in _ctx.MstarPrices
on cmpn.securityId equals prcs.securityId into cs
from c in cs.DefaultIfEmpty()
where c.date01 == reviewDateS
select new ClsMarketPrices { })
Following are the 3 relevant classes. ClsMarketPrices does not relate to a database table, it is simple a class that combines the other 2 classes which may not be necessary but with my limited knowledge it is how it is working.
_ctx is a repository that links to the data context.
public MySQLRepositories(ApplicationDbContext ctx)
{
_ctx = ctx;
}
public class ClsMarket
{
[Key]
public int CompanyId { get; set; } = 0;
public string securityId { get; set; } = "";
public string companyName { get; set; } = "";
public string mic { get; set; } = "";
public string currency { get; set; } = "";
[ForeignKey("securityId")]
public virtual ICollection<ClsPrices> Prices { get; set; }
}
public class ClsMarketPrices
{
[Key]
public int CompanyId { get; set; } = 0;
public string companyName { get; set; } = "";
public string period { get; set; } = "";
public string mic { get; set; } = "";
}
public class ClsPrices
{
[Key]
public int PricesId { get; set; }
[ForeignKey("securityId")]
public string securityId { get; set; } = "";
public string mic { get; set; } = "";
public string date01 { get; set; } = "";
public virtual ClsMarket ClsMarket {get; set;}
}
I want to get a record from the 1st file joined with a record from the 2nd file where that record from the 2nd file has a date equal to or the last before the nominated date.
So we are talking about files, not a database! This is important, because this means that your local process will execute the LINQ, not a database management system. In other words: the LINQ will be IEnumerable, not IQueryable.
This is important, because as Enumerable, you will be able to define your own LINQ extension methods.
Although you supplied an enormous amount of irrelevant properties, you forgot to give us the most important things: you were talking about two files, you told us that you have two classes with a one-to-many relation, but you gave us three classes. Which ones do have the relation that you are talking about?
I think that every object of ClsMarketPrices has zero or more ClsPrices, and that every ClsPrice is one of the prices of a ClsMarketPrices, namely the ClsMarketPrices that the foreign key SecurityId (rather confusing name) refers to.
First of all, let's assume you already have procedures to read the two sequences from your files. And of course, these procedures won't read more than needed (so don't read the whole file if you will only use the first ClsMarket). I assume you already know how to do that:
IEnumerable<ClsMarketPrices> ReadMarketPrices();
IEnumerable<ClsPrices> ReadPrices();
So you've go a DateTime reviewDate. Every MarketPrice has zero or more Prices. Every Price has a DateTime property DateStamp. You want for every MarketPrice the Price that has the largest value for DateStamp that is smaller or equal to reviewDate.
If a MarketPrice doesn't have such a Prices, for instance because it doesn't have a Price at all, or all its Prices have a DateStamp larger than reviewDate, you want a value null.
You didn't say what you want if a MarketPrice has several Prices with equal largest DateStamp <= reviewDate. I assume that you don't care which one is selected.
The straighforward LINQ method would be to use GroupJoin, Where, Orderby and FirstOrDefault:
DateTime reviewDate = ...
IEnumerable<ClsMarketPrices> marketPricess = ReadMarketPrices();
IEnumerable<ClsPrices> prices = ReadPrices().Where(price => price.DateStamp <= reviewDate);
// GroupJoin marketPrices with prices:
var result = markets.GroupJoin(prices,
marketPrice => marketPrice.CompanyId, // from every MarketPrice take the primary key
price => price.CompanyId, // from every price take the foreign key to its market
// parameter resultSelector: from every market, with its zero or more matching prices
// make one new:
(marketPrice, pricesOfThisMarketPrice) => new
{
// select the marketPrice properties that you plan to use:
Id = marketPrice.CompanyId,
Name = ...
...
// from all prices of this marketPrice, take the one with the largest DateStamp
// we know there are no marketPrices with a DataStamp larger than reviewData
LatestPrice = pricesOfThisMarketPrice.OrderbyDescending(price => price.DateStamp)
.Select(price => new
{
// Select the price properties you plan to use;
Id = price.PricesId,
Date = price.DateStamp,
...
})
.FirstOrDefault(),
});
The problem is: this must be done efficiently, because you have an immense amount of Markets and MarketPrices.
Althoug we already limited the amount of prices to sort by removing the prices that are after reviewDate, it is still a waste to order all Dates if you will only be using the first one.
We can optimize this, by using Aggregate for pricesOfThisMarketPrice. This will assert that pricesOfThisMarketPrice will be enumerated only once.
Side remarks: Aggregate only works on IEnumerable, not on IQueryable, so it won't work on a database. Furthermore, pricesOfThisMarketPrice might be an empty sequence; we have to take care of that.
LatestPrice = pricesOfThisMarketPrice.Any() ?
pricesOfThisMarketPrice.Aggregate(
// select the one with the largest value of DateStamp:
(latestPrice, nextPrice) => nextPrice.DateStamp >= latesPrice.DateStamp) ? nextPrice : latestPrice)
// do not do the aggregate if there are no prices at all:
: null,
Although this Aggregate is more efficient than OrderBy, your second sequence will still be enumerated more than once. See the source code of Enumerable.GroupJoin.
If you really want to enumerate your second source once, and limit the number of enumerations of the first source, consider to create an extension method. This way you can use it as any LINQ method. If you are not familiar with extension methods, see extension methods demystified.
You can create an extension method for your ClsPrices and ClsPrice, however, if you think you will need to "find the largest element that belongs to another element" more often, why not create a generic method, just like LINQ does.
Below I create the most extensive extension method, one with a resultSelector and equalityComparers. If you will use standard equality, consider to add an extension method without these comparers and let this extension method call the other extension method with null value for the comparers.
For examples about the overloads with and without equality comparers see several LINQ methods, like ToDictionary: there is a method without a comparer and one with a comparer. This first one calls the second one with null value for comparer.
I will use baby steps, so you can understand what happens.
This can slightly be optimized.
The most important thing is that you will enumerate your largest collection only once.
IEnumerable<TResult> TakeLargestItem<T1, T2, TKey, Tproperty, TResult>(
this IEnumerable<T1> t1Sequence,
IEnumerable<T2> t2Sequence,
// Select primary and foreign key:
Func<T1, TKey> t1KeySelector,
Func<T2, TKey> t2KeySelector,
// Select the property of T2 of which you want the largest element
Func<T2, TProperty> propertySelector,
// The largest element must be <= propertyLimit:
TProperty propertyLimit,
// From T1 and the largest T2 create one TResult
Func<T1, T2, TResult> resultSelector,
// equality comparer to compare equality of primary and foreign key
IEqualityComparer<TKey> keyComparer,
// comparer to find the largest property value
IComparer<TProperty> propertyComparer)
{
// TODO: invent a property method name
// TODO: decide what to do if null input
// if no comparers provided, use the default comparers:
if (keyComparer == null) keyComparer = EqualityComparer<TKey>.Default;
if (propertyComparer == null) propertyComparer = Comparer<TProperty>.Default;
// TODO: implement
}
The implementation is straightforward:
put all T1 in a dictionary t1Key as key, {T1, T2} as value, keyComparer as comparer
then enumerate T2 only once.
check if the property <= propertyLimit,
if so, search in the dictionary for the {T1, T2} combination with the same key
check if the current t2Item is larger than the T2 in the {T1, T2} combination
if so: replace
We need an internal class:
class DictionaryValue
{
public T1 T1 {get; set;}
public T2 T2 {get; set;}
}
The code:
IDictionary<TKey, DictionaryValue> t1Dict = t1Sequence.ToDictionary(
t1 -> t1KeySelector(t1),
t1 => new DictionaryValue {T1 = t1, T2 = (T2)null },
keyComparer);
The enumeration of t2Sequence:
foreach (T2 t2 in t2Sequence)
{
// check if the property is <= propertyLimit
TProperty property = propertySelector(t2);
if (propertyComparer.Compare(property, propertyLimit) < 0)
{
// find the T1 that belongs to this T2:
TKey key = keySelector(t2);
if (t1Dict.TryGetValue(key, out DictionaryValue largestValue))
{
// there is a DictionaryValue with the same key
// is it null? then t2 is the largest
// if not null: get the property of the largest value and use the
// propertyComparer to see which one of them is the largest
if (largestValue.T2 == null)
{
largestValue.T2 = t2;
}
else
{
TProperty largestProperty = propertySelector(largestValue.T2);
if (propertyComparer.Compare(property, largestProperty) > 0)
{
// t2 has a larger property than the largestValue: replace
largestValue.T2 = t2,
}
}
}
}
}
So for every t1, we have found the largest t2 that has a property <= propertyLimit.
Use the resultSelector to create the results.
IEnumerable<TResult> result = t1Dict.Values.Select(
t1WithLargestT2 => resultSelector(t1WithLargestT2.T1, t1WithLargestT2.T2));
return result;

RavenDB: static index casting and sorting issue

I have a problem with RavenDB indexing.
Simple query looks like this:
var values =
myCollection.Query.Where(w =>
w.MyId == MyId &&
w.IsReady == false &&
w.IsDeleted &&
w.Rate > 0)
During execution Raven creates dynamic index:
from doc in docs.MyCollection
select new { Rate = doc.Rate, IsReady = doc.IsReady, IsDeleted = doc.IsDeleted, MyId = doc.MyId }
with extra options:
Field -> Rate;
Storage -> No;
Indexing -> Default;
Sort -> Double;
Field Rate has decimal type.
Problem:
I wanted to add static index, but when I specified index like this:
public class MyIndex : AbstractIndexCreationTask<MyCollection> {
public MyIndex () {
Map = d => d.Select(s => new { Rate = s.Rate, IsReady = s.IsReady, IsDeleted = s.IsDeleted, MyId = s.MyId });
Sort(x => x.Rate, SortOptions.Double);
}
}
Raven is creating index slightly different:
from doc in docs.MyCollection
select new { Rate = (decimal)doc.Rate, IsReady = doc.IsReady, IsDeleted = doc.IsDeleted, MyId = doc.MyId }
with extra options:
Field -> Rate;
Storage -> No;
Indexing -> Default;
Sort -> Double;
The only difference is that I have casting in static index, because my field type is decimal and I'm using Double sort option.
Because of that Raven is not using my static index but instead creates dynamic one every time query is being executed.
I tried to do some casting inside Sort() but then index has not been created at all. One way to overcome this issue is to manually modify static index from management console after it was created, but it's not good solution.
Any ideas how to deal with that?
Thanks.
Edit:
Another example:
Type of field DateTime and querying using DateTime values as predicates (greater than / less than). Raven in dynamic index creation picks String as a SortOption, and when I try to prepare static index I get casting issue.
You can use the IDocumentSession.Query(string indexName, [bool isMapReduce]) or the IDocumentSession.Query<TResult, TIndexCreator>() overloads to explicitly specify a static index. So in your specific case, either IDocumentSession.Query<MyCollection, MyIndex>() or IDocumentSession.Query("MyIndex").

How to query an index with a subcollection that has date ranges in RavenDB?

I prepared the full test case here: https://gist.github.com/pkrakowiak/cc8addf5725193a01f2d
There are Location documents. Each location can have zero or more sponsors during some time periods (represented by the IList<Sponsorship> Sponsors property). I need to return only those locations that are sponsored on a particular day (say 15th of March in my example). So such location must have at least one Sponsorship instance that matches the following query: .Where(x => x.Sponsors.Any(s => s.From <= today && s.To >= today))
I prepared two tests, one is not using an index explicitly: CanGetCurrentlySponsoredLocations, and one which uses a static index that I created: CanGetCurrentlySponsoredLocationsUsingStaticIndex. The first one will pass, the second one will fail. The question is - how do I make the second test pass? What sort of modifications do I need to apply to my Locations_ByCoordinates index?
In case you are wondering where the index name came from or what the reviews are - just ignore them. :) They are leftovers from other things that I was testing.
Update
I took this question first to the official RavenDB Google group: https://groups.google.com/forum/?fromgroups=#!topic/ravendb/ySUPXqkpTA8 Sadly, it did not bring me a solution.
The simplest index that will pass your unit test is:
private class Locations_ByCoordinates : AbstractIndexCreationTask<Location>
{
public Locations_ByCoordinates()
{
Map = locations => from l in locations
from s in l.Sponsors
select new
{
Sponsors_From = s.From,
Sponsors_To = s.To
};
}
}
You might want to pick a better name, since the coordinates aren't indexed.
I'm not sure what your other test CanSortOnSponsorshipStatus is all about though.
UPDATE
To include locations that have no sponsors, use the DefaultIfEmpty linq extension method. This will make sure that all locations have at least one index entry.
private class Locations_ByCoordinates : AbstractIndexCreationTask<Location>
{
public Locations_ByCoordinates()
{
Map = locations => from l in locations
from s in l.Sponsors
.DefaultIfEmpty(new Sponsorship
{
From = DateTime.MinValue,
To = DateTime.MaxValue
})
select new
{
Sponsors_From = s.From,
Sponsors_To = s.To
};
}
}

Having difficulty with specific Future<T> query and collections with nHibernate.

For the sake of example, I am removing non-queried and non-essential data just to figure out how to do the initial query here.
I have a model structure like this.
class Path {
Guid Id { get; protected set; }
IList<Step> Steps { get; set; }
void AddStep(Step entity) {
// write up bidirectional association
}
}
class Step {
Guid Id { get; protected set; }
Path Path { get; set; }
// other data irreleveent
}
Now assuming 50000 steps, each with 5000 steps... I do realize I don't want to return all of them at once. But putting a limit on my query fetch isn't my real problem.
Here is the exact query I am attempting to use. I am getting the exception..
NHibernate.QueryException : duplicate alias: lpStep
----> System.ArgumentException : An item with the same key has already been added.
I'm not entirely sure how to handle this scenario. if I use a flat out Fetch on the Path query, I get Select+N errors from the NHibernate Profiler.
I do have batching enabled - but as far as I am aware, that only really applies to inserts, not retrievals. But in any case I am getting back these errors and not sure how to handle it. Any ideas?
using (var Transaction = Session.BeginTransaction()) {
Path lpPath = null;
Step lpStep = null;
var lpPaths = Session.QueryOver<Path>(() => lpPath)
.Take(50)
.Future<Path>();
var lpSteps = Session.QueryOver<Step>(() => lpStep)
.JoinAlias(() => lpPath.Steps, () => lpStep)
.Where(o => o.Path.Id == lpPath.Id)
.Take(12)
.Future<Step>();
Transaction.Commit();
foreach (var path in lpPaths) {
Console.WriteLine("{0} fetched {1} Steps",
path.Id, path.Steps.Count);
}
}
I basically want to say ..
Select (50) Paths, also, as a separate select but part of the same trip, Select the first (12) Steps that belong the previously selected Paths.
But if I use a flat out join, I get 110 rows, whereas I expect to have 2 tables, 1 of 50 rows, 1 of 600 rows.
Can someone explain to me what I am doing wrong?
mind you, I can do some minor alterations and the query runs, but it isn't 'optimized'. I can get the data I want, but it takes multiple trips and lazy loading. I can optimize the actual Path selection easily enough but it is those blasted Steps. If I just take a restrictive where clause out of the lpSteps query, it just returns the first 12 steps, not returning 12 steps for each query done.
I've looked at some of the other stack overflow posts on Future<T> and found them to look a lot like this. So I don't understand why it isn't working. I suspect that what is going on is this..
lpPaths runs.
lpSteps tries to run, first one succeeds.
lpSteps then tries to run again, finds it cannot redefine lpPaths.
Apocolypse
I'm really hoping someone smarter than me can enlighten me on the absolute most optimal way to write this.
i cant really understand what your use case is. why do you only need the first 12 Steps of each Path? What about batches of Steps to process
IList<Guid> pathIds;
while ((pathIds = QueryOver.For<Path>()
.Where(...)
.Projection(path => path.Id)
.SetmaxResults(100)).Count > 0)
{
int batch = 0;
const int batchsize = 600;
IList<Step> steps;
while ((steps = Session.QueryOver<Step>()
.Where(step => step.Path.Id).In(pathIds)
.Where(step => step. ...)
.SetFirstResult(batch * batchsize)
.Take(batchsize)
.List<Step>()).Count > 0)
{
DoSomething(steps);
batch++;
}
}

Adding a index to collection using EF 4.1 and XAML (For a highscore table)

I have a webservice I call from a WP7 app. I get a list of high scores in a table (name/score).. What is the simpliest way to add a 3rd column on the far left which is simply the row?
Do I need to add a property to the entity? Is there someway to get the row #?
I tried these things below with no success..
[OperationContract]
public List<DMHighScore> GetScores()
{
using (var db = new DMModelContainer())
{
// return db.DMHighScores.ToList();
var collOrderedHighScoreItem = (from o in db.DMHighScores
orderby o.UserScore ascending
select new
{
o.UserName,
o.UserScore
}).Take(20);
var collOrderedHighScoreItem2 = collOrderedHighScoreItem.AsEnumerable().Select((x, i) => new DMHighScoreDTO
{
UserName = x.UserName,
UserScore = x.UserScore
}).ToList();
}
}
[DataContract]
public class DMHighScoreDTO
{
int Rank;
string UserName;
string UserScore;
}
So lets assume you want to load top 100 users in leaderboard and you want to have their rank included:
[OperationContract]
public List<ScoreDto> GetTop100()
{
// Linq to entities query
var query = (from u from context.Users
order by u.Score
select new
{
u.Name,
u.Score
}).Take(100);
// Linq to objects query from working on 100 records loaded from DB
// Select with index doesn't work in linq to entities
var data = query.AsEnumerable().Select((x, i) => new ScoreDto
{
Rank = i + 1,
Name = x.Name,
Score = x.Score
}).ToList();
return data;
}
what will the row number be used for? if this is for ordering might I suggest adding a column named Order, then map the column to your entity.
if you require a row index, you could also call the .ToList() on the query and fetch the index locations for each entity.
Edit:
you could add the Rank property and set it to Ignore. This will enable you to go through the collection set the rank with a simple for loop. This will also not be persisted in the database. It will also not have any required columns in the database.
It does add an extra iteration.
the other way to go about it. This would be to add the rank number in the generated UI and not in the data collection being used to bind.