If I want to apply a predicate to a document before I aggregate in a Reduce function, do I want to place that predicate in the Map function, or in the Reduce function?
So for example putting the predicate in the Map function would look like this:
Map = orders => orders
.Where(order => order.Status != OrderStatus.Cancelled)
.Select(order => new
{
Name = order.Firstname + ' ' + order.Lastname,
TotalSpent = order.Total,
NumberOfOrders = 1
});
Reduce = results => results
.GroupBy(result => result.Email)
.Select(customer => new
{
Name = customer.Select(c => c.Name).FirstOrDefault(),
TotalSpent = customer.Sum(c => c.TotalSpent),
NumberOfOrders = customer.Sum(c => c.NumberOfOrders)
});
And putting it in the Reduce function would look like this:
Map = orders => orders
.Select(order => new
{
Name = order.Firstname + ' ' + order.Lastname,
TotalSpent = order.Total,
NumberOfOrders = 1,
Status = order.Status
});
Reduce = results => results
.Where(order => order.Status != OrderStatus.Cancelled)
.GroupBy(result => result.Email)
.Select(customer => new
{
Name = customer.Select(c => c.Name).FirstOrDefault(),
TotalSpent = customer.Sum(c => c.TotalSpent),
NumberOfOrders = customer.Sum(c => c.NumberOfOrders),
Status = (OrderStatus)0
});
The latter obviously makes more sense, however it means that I have to add the Status property to the class of the Reduce result and then just set it to some unknown value in the Reduce as it doesn't actually mean anything there.
Only the first approach works for map/reduce. And no, the order will be ignored and you can't do something like FirstOrDefault in the result.
You need to think of map/reduce as two independent functions whereas the reduce function can be run multiple times on the same input, that's why the format of the input must match the format of the output. This can also happen on different server in parallel and asynchronously, thus new documents can be saved while the indexing is running.
Related
I'm accustomed to GroupBy() being more of an art than a science, but maybe someone can help me with a very specific problem:
Given the following code
var results = session.Query<MyClass>()
.GroupBy(c => c.OtherPersistentObject)
.Select(group => new
{
key = group.Key,
count = group.Count()
})
.ToList();
The generated query comes out like this:
/* [expression] */select
otherclass_.ID as col_0_0_,
cast(count(*) as INT) as col_1_0_,
otherclass_.ID as id1_1_,
otherclass_.START_DATE as start2_1_,
otherclass_.END_DATE as end3_1_,
otherclass_.Zone as zone9_1_
from
mytable mytable0_
left outer join
otherclass otherclass_
on mytable0_.otherID=otherclass_.ID
group by
mytable0_.otherID
which gives me the SQL error "Column 'otherclass .ID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause"
Is there a way to get the Select to do what I want?
TIA
It's a known NHibernate issue NH-3027.
As a workaround you can use last approach described in this answer (rewrite GroupBy part as sub-query). So your query can be rewritten to something like:
var results = session.Query<MyClass>()
.Where(c => c == session.Query<MyClass>().First(cs => cs.OtherPersistentObject == c.OtherPersistentObject))
.Select(x => new
{
key = x.OtherPersistentObject,
count = session.Query<MyClass>().Count(cs => cs.OtherPersistentObject == x.OtherPersistentObject)
}).ToList();
Try this:
var results = session
.Query<MyClass>()
.GroupBy(c => c.OtherPersistentObject)
.Select(group => new
{
key = group.Key.Id,
count = group.Count()
})
.ToList();
Here you can find the reason for the error.
I have a main VendorProfile table and a 1-many VendorHistory table that contains status codes and date stamps. The query below works at retrieving only the latest status (status code and date) for each vendor. However, the view allows the user to select checkboxes of any of the status codes to filter the view. So I need to add a where clause that matches ANY of the checkbox StatusSelections.
Model Diagram
public IEnumerable<BrowseStatusModel> BrowseByStatus(int[] StatusSelections)
{
IQueryable<BrowseStatusModel> query = _db.VendorProfiles
.Include("VendorStatusHistory")
.Include("StatusCodes")
.Select(s => new BrowseStatusModel
{
ProfileID = s.ProfileID,
Name = s.Name,
CompanyName = s.CompanyName,
CompanyDBA = s.CompanyDBA,
DateCreated = s.DateCreated,
Status = s.VendorStatusHistories.OrderByDescending(o => o.DateCreated).FirstOrDefault().Id,
StatusDate = s.VendorStatusHistories.OrderByDescending(o => o.DateCreated).FirstOrDefault().DateCreated
})
.OrderBy(x => x.ProfileID);
foreach (int status in StatusSelections)
{
query = query.Where(x => x.Status == status);
}
return query;
}
The above foreach loop works but, unfortunately creates AND condition where ALL selections must be true instead of ANY. I figured I would have to use a where clause with the following in some way but have been unsuccessful at the correct syntax.
.AsQueryable().Any();
Use contains in the place of that foreach loop
query = query.Where(x => StatusSelections.Contains(x.Status))
I have a small problem with multiple instances of the same object after a join to an other table. For testing I create one Store with two Products (ManyToMany-Relation). The following snippet hopefully describes my problem.
var preResult = _session.QueryOver<Store>().List(); // One store
Product productAlias = null;
var result = _session.QueryOver<Store>()
.JoinAlias(s => s.Products, () => productAlias)
.List(); // Two instances of the same store
I even think this behavior is correct but how can I prevent the multiple instances? Is it possible within the query?
Just for information why I need to make this unnecessary join: I want to extend the query according to different critirias, similar to this:
Product productAlias = null;
var query = _session.QueryOver<Store>().JoinAlias(s => s.Products, () => productAlias);
if (!string.IsNullOrWhiteSpace(criteria.ProductName))
{
query.Where(Restrictions.On(() => productAlias.Name).IsInsensitiveLike(criteria.ProductName));
}
if (criteria.ProductType != null)
{
query.Where(s => productAlias.Type == criteria.ProductType);
}
var result = query.List();
Here I ran into different problems, depending on the criterias.
Try using Transformers.DistinctRootEntity in your scenario to eliminate the cartesian product.
Product productAlias = null;
var query = _session.QueryOver<Store>()
.JoinAlias(s => s.Products, () => productAlias)
query = query.TransformUsing(Transformers.DistinctRootEntity);
var result = query.List();
Let's split solution into two queries.
Top one QueryOver<Store>() will be correctly returning just a distinct list. And what's more, by design it will support paging (Take(), Skip()).
The inner one, will be returning just a list of Store IDs, which fully meet whatever criteria...
The result SQL will look like this
SELECT ... // top one
FROM Store
WHERE StoreID IN ( SELECT StoreID ...) // inner one
Inner
Let's start with the inner select, the NHibernate detached QueryOver:
Store storeAlias = null;
Product productAlias = null;
// detached query, resulting in a set of searched StoreID
var subQuery = QueryOver.Of<Store>(() => storeAlias)
.JoinAlias((s) => s.Products, () => productAlias)
.Select((s) => s.ID); // ID projection
if (!string.IsNullOrWhiteSpace(criteria.ProductName))
{
subQuery.Where(Restrictions.On(() => productAlias.Code)
.IsInsensitiveLike(criteria.ProductName));
}
Top
Once we have filtered the Store we can use this subquery in top one
var query = session.QueryOver<Store>()
// IN clause
.Where(Subqueries.PropertyIn("ID", subQuery.DetachedCriteria))
.Skip(100)
.Take(50) // paging over already distinct resultset
;
var result = query.List<Store>();
And now we can apply whatever filter to inner query, and get list of Store IDs which do meet filter criteria... while working with top query, which is distinct...
Can you send more parameters than needed to a prepared statement using PDO with no undesired side effects?
That mights seem like a strange question but I ask because I have 4 queries in a row which all use similar and different parameters. The relevant parts of the queries:
1st (select, different table to others):
WHERE threadID = :tid
2nd (select):
WHERE user_ID = :u_ID AND thread_ID = :tid
3rd (update if 2nd was successful):
SET time = :current_time WHERE user_ID = :u_ID AND thread_ID = :tid
4th (insert if 2nd was unsuccessful):
VALUES (:u_ID, :tid, :current_time)
Can I declare one array with the three parameters at the beginning and use it for all 4 queries?
To sort out any confusion, the queries would be executed seperately. It is the parameters variable being reused and so that would mean some queries would receive parameters they don't need. So something like:
$parameters = array(':tid' => $tid, ':u_ID' => $u_ID, ':current_time' => $time);
$1st = $db->prepare($query1);
$1st->execute($parameters);
$2nd = $db->prepare($query2);
$2nd->execute($parameters);
$3rd = $db->prepare($query3);
$3rd->execute($parameters);
$4th = $db->prepare($query4);
$4th->execute($parameters);
If I can, should I? Will this slow down or cause security flaws to my database or scripts?
If I can make this question a bit clearer, please ask.
Thank you!
Perhaps the documentation has been updated since this question was first asked, but now it is quite clearly stated "No"
You cannot bind more values than specified; if more keys exist in input_parameters than in the SQL specified in the PDO::prepare(), then the statement will fail and an error is emitted.
These answers should be useful in filtering out the extra parameters.
I know this is already answered and it's only asking about whether you can send extra params, but I thought people might arrive at this question, and want to know how to get around this limitation. Here's the solution I use:
$parameters = array('tid' => $tid, 'u_ID' => $u_ID, 'current_time' => $time);
$1st = $db->prepare($query1);
$1st->execute(array_intersect_key($parameters, array_flip(array('tid'))));
$2nd = $db->prepare($query2);
$2nd->execute(array_intersect_key($parameters, array_flip(array('u_ID', 'tid'))));
$3rd = $db->prepare($query3);
$3rd->execute(array_intersect_key($parameters, array_flip(array('u_ID', 'tid', 'current_time'))));
$4th = $db->prepare($query4);
$4th->execute(array_intersect_key($parameters, array_flip(array('u_ID', 'tid', 'current_time'))));
That array_interset_key and array_flip maneuver could be extracted to its own function, like:
function filter_fields($params,$field_names) {
return array_intersect_key($params, array_flip($field_names))
}
I just haven't got around to it yet.
The function flips your array of key names, so you have an array with no values, but the right keys. Then intersect filters the first array so you only have the keys that are in both arrays (in this case, only the ones in your array_flipped array). But you get the values for the original array (not the empties). So you make one array of parameters, but specify which params are actually sent to PDO.
So, with the function, you'd do:
$parameters = array('tid' => $tid, 'u_ID' => $u_ID, 'current_time' => $time);
$1st = $db->prepare($query1);
$1st->execute(filter_fields($parameters, array('tid')));
$2nd = $db->prepare($query2);
$2nd->execute(filter_fields($parameters, array('u_ID', 'tid')));
$3rd = $db->prepare($query3);
$3rd->execute(filter_fields($parameters, array('u_ID', 'tid', 'current_time')));
$4th = $db->prepare($query4);
$4th->execute(filter_fields($parameters, array('u_ID', 'tid', 'current_time')));
If you have PHP 5.4, you can use the square bracket array syntax, to make it even cooler:
$parameters = array('tid' => $tid, 'u_ID' => $u_ID, 'current_time' => $time);
$1st = $db->prepare($query1);
$1st->execute(filter_fields($parameters, ['tid']));
$2nd = $db->prepare($query2);
$2nd->execute(filter_fields($parameters, ['u_ID', 'tid']));
$3rd = $db->prepare($query3);
$3rd->execute(filter_fields($parameters, ['u_ID', 'tid', 'current_time']));
$4th = $db->prepare($query4);
$4th->execute(filter_fields($parameters, ['u_ID', 'tid', 'current_time']));
I got a chance to test my question, and the answer is you cannot send more parameters than the query uses. You get the following error:
PDOException Object
(
[message:protected] => SQLSTATE[HY093]: Invalid parameter number: parameter was not defined
[string:Exception:private] =>
[code:protected] => HY093
[file:protected] => C:\Destination\to\file.php
[line:protected] => line number
[trace:Exception:private] => Array
(
[0] => Array
(
[file] => C:\Destination\to\file.php
[line] => line number
[function] => execute
[class] => PDOStatement
[type] => ->
[args] => Array
(
[0] => Array
(
[:u_ID] => 1
[:tid] => 1
[:current_time] => 1353524522
)
)
)
[1] => Array
(
[file] => C:\Destination\to\file.php
[line] => line number
[function] => function name
[class] => class name
[type] => ->
[args] => Array
(
[0] => SELECT
column
FROM
table
WHERE
user_ID = :u_ID AND
thread_ID = :tid
[1] => Array
(
[:u_ID] => 1
[:tid] => 1
[:current_time] => 1353524522
)
)
)
)
[previous:Exception:private] =>
[errorInfo] => Array
(
[0] => HY093
[1] => 0
)
)
I don't know a huge amount about PDO, hence my question, but I think that because :current_time is sent but not used and the error message is "Invalid parameter number: parameter was not defined" you cannot send extra parameters which are not used.
Additionally the error code HY093 is generated. Now I can't seem to find any documentation explaining PDO codes anywhere, however I came across the following two links specifically about HY093:
What is PDO Error HY093
SQLSTATE[HY093]
It seems HY093 is generated when you incorrectly bind parameters. This must be happening here because I am binding too many parameters.
executing different type of multiple queries with one execute leads to problems. you can run multiple selects or multiple updates with one execute. For this case to create different prepared statements objects and pass the the parameters accordingly.
// for WHERE threadID = :tid
$st1 = $db->prepare($sql);
$st1->bindParam(':tid', $tid);
$st1->execute();
or
$st1->execute(array(':tid'=>$tid);
// for WHERE user_ID = :u_ID AND thread_ID = :tid
$st2 = $db->prepare($sql);
$st2->bindParam(':u_ID', $u_ID);
$st2->bindParam(':tid', $tid);
$st2->execute();
or
$st2->execute(array(':tid'=>$tid, ':u_ID' => $u_ID);
// for SET time = :current_time WHERE user_ID = :u_ID AND thread_ID = :tid
$st3 = $db->prepare($sql);
$st3->bindParam(':u_ID', $u_ID);
$st3->bindParam(':tid', $tid);
$st3->bindParam(':current_time', $current_time);
$st3->execute();
or
$st3->execute(array(':tid'=>$tid, ':u_ID' => $u_ID, ':current_time' => $current_time);
// for VALUES (:u_ID, :tid, :current_time)
$st4 = $db->prepare($sql);
$st4->bindParam(':u_ID', $u_ID);
$st4->bindParam(':tid', $tid);
$st4->bindParam(':current_time', $current_time);
$st4->execute();
or
$st4->execute(array(':tid'=>$tid, ':u_ID' => $u_ID, ':current_time' => $current_time);
Basically I crossed the same problem of Linq provider in this linq-to-nhibernate-produces-unnecessary-joins
List<Competitions> dtoCompetitions;
dtoCompetitions = (from compset in session.Query<FWBCompetitionSet>()
where compset.HeadLine == true
&& compset.A.B.CurrentSeason == true
select (new Competitions
{
CompetitionSetID = compset.CompetitionSetID,
Name = compset.Name,
Description = compset.Description,
Area = compset.Area,
Type = compset.Type,
CurrentSeason = compset.A.B.CurrentSeason,
StartDate = compset.StartDate
}
)).ToList();
Which leads to duplicated join in its generated SQL
SELECT fwbcompeti0_.competitionsetid AS col_0_0_,
fwbcompeti0_.name AS col_1_0_,
fwbcompeti0_.DESCRIPTION AS col_2_0_,
fwbcompeti0_.area AS col_3_0_,
fwbcompeti0_.TYPE AS col_4_0_,
fwbseason3_.currentseason AS col_5_0_,
fwbcompeti0_.startdate AS col_6_0_
FROM fwbcompetitionset fwbcompeti0_
INNER JOIN A fwbcompeti1_
ON fwbcompeti0_.competitionseasonid = fwbcompeti1_.competitionseasonid
INNER JOIN A fwbcompeti2_
ON fwbcompeti0_.competitionseasonid = fwbcompeti2_.competitionseasonid
INNER JOIN B fwbseason3_
ON fwbcompeti2_.seasonid = fwbseason3_.seasonid
WHERE fwbcompeti0_.headline = #p0
AND fwbseason3_.currentseason = #p1
Notice these joins, which are totally duplicated and also affect my SQL Server's performence.
INNER JOIN A fwbcompeti1_
ON fwbcompeti0_.competitionseasonid = fwbcompeti1_.competitionseasonid
INNER JOIN A fwbcompeti2_
ON fwbcompeti0_.competitionseasonid = fwbcompeti2_.competitionseasonid
Update1
In the NHibernate 3.2, this LiNQ bug is still valid, and I could not find a simple and reasonable Linq solution.
So I used QueryOver + JoinAlias + TransformUsing finishing the job, workds perfect to me.
FWBCompetitionSet compset = null;
FWBCompetitionSeason compseason = null;
FWBSeason season = null;
IList<Competitions> dtoCompetitions;
dtoCompetitions = session.QueryOver<FWBCompetitionSet>(() => compset)
.JoinAlias(() => compset.FWBCompetitionSeason, () => compseason)
.JoinAlias(() => compseason.FWBSeason, () => season)
.Where(() => compset.HeadLine == true)
.And(() => season.CurrentSeason == true)
.SelectList(
list => list
.Select(c => c.CompetitionSetID).WithAlias(() => compset.CompetitionSetID)
.Select(c => c.Name).WithAlias(() => compset.Name)
.Select(c => c.Description).WithAlias(() => compset.Description)
.Select(c => c.Area).WithAlias(() => compset.Area)
.Select(c => c.Type).WithAlias(() => compset.Type)
.Select(c => season.CurrentSeason).WithAlias(() => season.CurrentSeason)
.Select(c => c.StartDate).WithAlias(() => compset.StartDate)
)
.TransformUsing(Transformers.AliasToBean<Competitions>())
.List<Competitions>();
Yet Another Edit:
I think I finally found out what's going on. It seems that the LINQ to NHibernate provider has trouble navigating associations from the target to the source table and generates a separate join each time it encounters such an association.
Since you don't provide your mapping, I used the mapping from linq-to-nhibernate-produces-unnecessary-joins. This model has a Document with one Job and many TranslationUnits. Each TranslationUnit has many Translation entities.
When you try to find a Translation based on a Job, you are traversing the associations in the reverse order and the LINQ provider generates multiple joins: one for Translation -> TranslationUnit and one for TranslationUnit to Document.
This query will generate redundant joins:
session.Query<TmTranslation>()
.Where(x => x.TranslationUnit.Document.Job == job)
.OrderBy(x => x.Id)
.ToList();
If you reverse the navigation order to Document -> TranslationUnit -> Translation, you get a query that doesn't produce any redundant joins:
var items=(from doc in session.Query<Document>()
from tu in doc.TranslationUnits
from translation in tu.Translations
where doc.Job ==job
orderby translation.Id
select translation).ToList();
Given this quirkiness, QueryOver seems like a better option.
Previous Edit:
I suspect the culprit is compset.A.B.CurrentSeason. The first joined table (fwbcompeti1_) returns A.B while the next two (fwbcompeti2_ and fwbseason3_) are used to return A.B. The LINQ to NHibernate provider doesn't seem to guess that A is not used anywhere else and fails to remove it from the generated statement.
Try to help the optimizer a little by replacing CurrentSeason = compset.A.B.CurrentSeason with CurrentSeason = true from the select, since your where statement returns only items with CurrentSeason == true.
EDIT: What I mean is to change the query like this:
List<Competitions> dtoCompetitions;
dtoCompetitions = (from compset in session.Query<FWBCompetitionSet>()
where compset.HeadLine == true
&& compset.A.B.CurrentSeason == true
select (new Competitions
{
CompetitionSetID = compset.CompetitionSetID,
Name = compset.Name,
Description = compset.Description,
Area = compset.Area,
Type = compset.Type,
CurrentSeason = true,
StartDate = compset.StartDate
}
)).ToList();
I simply replace the value compset.A.B.CurrentSeason with true