Enormous SQL generated for Linq-to-Sql Grouping - sql

I have one Linq to SQL query that simply joins three tables and then performs grouping. Here is the query:
(from s in mktActualSales
join p in sysPeriods on s.PeriodID equals p.PeriodID
join d in setupDesignations on s.PositionID equals d.DesignationID
group new { s, p } by new{d.Title,d.DesignationID} into temping
select
new
{
SPOPosition = temping.Key.Title,
SalesPeriods = temping.Select(x=>new {PeriodID = x.p.PeriodID, StartDate = x.p.StartDate, EndDate = x.p.EndDate, SaleTargetID = x.s.ActualSaleID, IsApproved = x.s.IsApproved}),
PositionID = temping.Key.DesignationID
}).Take(5)
When I inspect SQL generated (executed) by this query in LinqPad, there are 6 SQL statements; first one performs join and grouping and rest of queries are the same just calling same query over and over again with different parameters. Obviously parameters are the values included in Key of the group.
It makes me believe that for a record with 130 groups Linq will hit the database 131 times. How can I save so much hits to the database? Should I perform grouping after loading data into memory i.e after calling ToList on joined query?

(I am assuming mktActualSales is your table and you missed putting the Datacontext in the question.)
If it is your table you should have a look at LoadOptions on your DataContext. With this you can eager load some of the related data and this might prevent the repetitive queries.
http://msdn.microsoft.com/en-us/library/system.data.linq.dataloadoptions.aspx
If it is not a table, have a look on how you populate mktActualSales but judging from your question I assume it is a table.

Related

Alternate solution for the query - Used INTERSECT function in oracle plsql

I am working on the query. I have two tables one is detail table where not grouping happen and its like including all the values and other table is line table which has important column grouped together from detail table.
I want to show all the column from line table and some column from detail table.
I am using below query to fetch my records
SELECT ab.*,
cd.phone_number,
cd.id
FROM xxx_line ab,
xxx_detail cd
WHERE cd.reference_number = ab.reference_number
AND cd.org_id = ab.org_id
AND cd.request_id = ab.request_id
AND ab.request_id = 13414224
INTERSECT
SELECT ab.*,
cd.phone_number,
cd.id
FROM xxx_line ab,
xxx_detail cd
WHERE cd.reference_number = ab.reference_number
AND cd.org_id = ab.org_id
AND cd.request_id = ab.request_id
AND ab.request_id = 13414224
The query is working fine...
But I want to know is there any other way for I can achieve the same result by not even using Intersect.
I purpose is to find out all possible way to get the same output.
The INTERSECT operator returns the unique set of rows returned by each query. The code can be re-written with a DISTINCT operator to make the meaning clearer:
SELECT DISTINCT
xxx_line.*,
xxx_detail.phone_number,
xxx_detail.id
FROM xxx_line
JOIN xxx_detail
ON xxx_line.reference_number = xxx_detail.reference_number
AND xxx_line.org_id = xxx_detail.org_id
AND xxx_line.request_id = xxx_detail.request_id
WHERE xxx_line.request_id = 13414224
I also replaced the old-fashioned join syntax with the newer ANSI join syntax (which makes relationships clearer by forcing the join tables and conditions to be listed close to each other) and removed the meaningless table aliases (because code complexity is more directly related to the number of variables than the number of characters).

How to model and query objects in relational databases?

I have a complex database scheme for a dictionary. Each object (essentially a translation) is similar to this:
Entry {
keyword;
examples;
tags;
Translations;
}
with
Translation {
text;
tags;
examples;
}
and
Example {
text;
translation;
phonetic_script;
}
i.e. tags (i.e. grammar) can belong to either the keyword itself, or the translation (grammar of the foreign language), and similar examples can belong either to the translation itself (i.e. explaining the foreign word) or the the text in the entry. I ended up with such kind of relational design:
entries(id,keyword,)
tags(tag)
examples(id,text,...)
entrytags(entry_id,tag)
entryexamples(entry_id,example_id)
translations(id,belongs_to_entry,...)
translationtags(transl_id, tag)
translationexamples(transl_id,example_id)
My main task is querying this database. Say I search for "foo", my current way of handling is:
query all entries with foo, get ids A
foreach id in A
query all examples belonging to id
query all tags belonging to id
query all translations belonging to A, store their ids in B
foreach tr_id in B
query all tags belonging to tr_id
query all examples belonging to tr_id
to rebuild my objects. This looks cumbersome to me, and is slow. I do not see how i could significantly improve this by using joins, or otherwise. I have a hard time modeling these objects to relations in the database. Is this a proper design?
How can I make this more efficient to improve query time?
Each query being called in the loop takes at a minimum a certain base duration of time to execute, even for trivial queries. Many environment factors contribute to what this duration is but for now let's assume it's 10 milliseconds. If the first query matches 100 entries then there are at a minimum 301 total queries being called, each taking 10 ms, for a total of 3 seconds. The number of loop iterations varies which can contribute to a substantial variation in the performance.
Restructuring the queries with joins will create more complex queries but the total number of queries being called can be reduced down to a fixed number, 4 in the queries below. Suppose now that each query takes 50 ms to execute now that it is more complex and the total duration becomes 200 ms, a substantial decrease from 3000 ms.
The 4 queries show below should come close to achieving the desired result. There are other ways to write the queries such as using subquery or including the tables in the FROM clause but these show how to do it with JOINs. The condition entries.keyword = 'foo' is used to represent the condition in the original query to select the entries.
It is worth noting that if the foo condition on entries is very expensive to compute then other optimizations may be needed to further improve performance. In these examples the condition is a simple comparison which is quick to lookup in an index but using LIKE which may require a full table scan may not work well with these queries.
The following query selects all examples matching the original query. The condition from the original query is expressed as a WHERE clause on the entries.keyword column.
SELECT entries.id, examples.text
FROM entries
INNER JOIN entryexamples
ON (entries.id = entryexamples.entry_id)
INNER JOIN examples
ON (entryexamples.example_id = examples.id)
WHERE entries.keyword = 'foo';
This query selects tags matching the original query. Only two joins are used in this case because the entrytags.tag column is what is needed and joining with tags would only provide the same value.
SELECT entries.id, entrytags.tag
FROM entries
INNER JOIN entrytags
ON (entries.id = entrytags.entry_id)
WHERE entries.keyword = 'foo'';
This query selects the translation tags for the original query. This is similar to the previous query to select the entrytags but another layer of joins is used here for the translations.
SELECT entries.id, translationtags.tag
FROM entries
INNER JOIN translations
ON (entries.id = translations.belongs_to_entry)
INNER JOIN translationtags
ON (translations.id = translationtags.transl_id)
WHERE entries.keyword = 'foo';
The final query does the same as the first query for the examples but also includes the additional joins. It's getting to be a lot of joins but in general should perform significantly better than looping through and executing individual queries.
SELECT entries.id, examples.text
FROM entries
INNER JOIN translations
ON (entries.id = translations.belongs_to_entry)
INNER JOIN translationexamples
ON (translations.id = translationexamples.transl_id)
INNER JOIN examples
ON (translationexamples.example_id = examples.id)
WHERE entries.keyword = 'foo';

How to implement paging in NHibernate with a left join query

I have an NHibernate query that looks like this:
var query = Session.CreateQuery(#"
select o
from Order o
left join o.Products p
where
(o.CompanyId = :companyId) AND
(p.Status = :processing)
order by o.UpdatedOn desc")
.SetParameter("companyId", companyId)
.SetParameter("processing", Status.Processing)
.SetResultTransformer(Transformers.DistinctRootEntity);
var data = query.List<Order>();
I want to implement paging for this query, so I only return x rows instead of the entire result set.
I know about SetMaxResults() and SetFirstResult(), but because of the left join and DistinctRootEntity, that could return less than x Orders.
I tried "select distinct o" as well, but the sql that is generated for that (using the sqlserver 2008 dialect) seems to ignore the distinct for pages after the first one (I think this is the problem).
What is the best way to accomplish this?
In these cases, it's best to do it in two queries instead of one:
Load a page of orders, without joins
Load those orders with their products, using the in operator
There's a slightly more complex example at http://ayende.com/Blog/archive/2010/01/16/eagerly-loading-entity-associations-efficiently-with-nhibernate.aspx
Use SetResultTransformer(Transformers.AliasToBean()) and get the data that is not the entity.
The other solution is that you change the query.
As I see you're returning Orders that have products which are processing.
So you could use exists statement. Check nhibernate manual at 13.11. Subqueries.

nhibernate cross table query optimization

I have a query I've written with NHibernate's Criteria functionality and I want to optimize it. The query joins 4 tables. The query works, but the generated SQL is returning all the columns for the 4 tables as opposed to just the information I want to return. I'm using SetResultTransformer on the query which shapes the returned data to an Individual, but not until after the larger sql is returned from the server.
Here's the NHibernate Criteria
return session.CreateCriteria(typeof(Individual))
.CreateAlias("ExternalIdentifiers", "ExternalIdentifier")
.CreateAlias("ExternalIdentifier.ExternalIdentifierType", "ExternalIdentifierType")
.CreateAlias("ExternalIdentifierType.DataSource", "Datasource")
.Add(Restrictions.Eq("ExternalIdentifier.Text1", ExternalId))
.Add(Restrictions.Eq("ExternalIdentifierType.Code", ExternalIdType))
.Add(Restrictions.Eq("Datasource.Code", DataSourceCode))
.SetResultTransformer(new NHibernate.Transform.RootEntityResultTransformer());
And the generated sql (from NHProfiler) is
SELECT (all columns from all joined tables)
FROM INDIVIDUAL this_
inner join EXTERNAL_ID externalid1_
on this_.INDIVIDUAL_GUID = externalid1_.GENERIC_GUID
inner join EXTERNAL_ID_TYPE externalid2_
on externalid1_.EXTERNAL_ID_TYPE_GUID = externalid2_.EXTERNAL_ID_TYPE_GUID
inner join SYSTEM_SRC datasource3_
on externalid2_.SYSTEM_SRC_GUID = datasource3_.SYSTEM_SRC_GUID
WHERE externalid1_.EXTERNAL_ID_TEXT_1 = 96800 /* #p0 */
and externalid2_.EXTERNAL_ID_TYPE_CODE = 'PATIENT' /* #p1 */
and datasource3_.SYSTEM_SRC_CODE = 'TOUCHPOINT' /* #p2 */
I only want the columns back from the Individual table. I could set a projection, but then I lose the Individual type.
I could also rewrite this with DetachedCriteria.
Are these my only options?
I had exactly the same question and asked one of NHibernate development team members directly, here is the answer I got:
with the Criteria API, the only thing you could do is to use a projection list and include all of the properties of the root entity... then you would need to use the AliasToBeanResultTransformer. Far from an optimal solution obviously
if you don't mind rewriting the query with hql, then you can do it very easily. an hql query that says "from MyEntity e join e.Association" will select both the entity columns as well as the association's columns (just like the problem you're having with criteria). But an hql query that says "select e from MyEntity e join e.Association" will only select the columns of e.

Speeding up a SQL query with generic data information

Due to a variety of design decisions, we have a table, 'CustomerVariable'. CustomerVariable has three bits of information--its own id, an id to Variable (a list of possible settings the customer can have), and the value for that variable. The Variable table, on the other hand, has the information on a default--in case the CustomerVariable is not set.
This works well in most situations, allowing us not to have to create an insanely long list of information--especially in a case where there are 16 similar variables that need to be handled for a customer.
The problem comes in trying to get this information into a select. So far, our 'best' solution involves far too many joins to be efficient--we get a list of the 16 VariableIds we need information on, setting them into variables, first. Later on, however, we have to do this:
CROSS JOIN dbo.Variable v01
LEFT JOIN dbo.CustomerVariable cv01 ON cv01.customerId = c.id
AND cv01.variableId = v01.id
CROSS JOIN dbo.Variable v02
LEFT JOIN dbo.CustomerVariable cv02 ON cv02.customerId = c.id
AND cv02.variableId = v02.id
-- snip --
CROSS JOIN dbo.Variable v16
LEFT JOIN dbo.CustomerVariable cv16 ON cv16.customerId = c.id
AND cv16.variableId = v16.id
WHERE
v01.id = #cv01VariableId
v02.id = #cv02VariableId
-- snip --
v16.id = #cv16VariableId
I know there has to be a better way, but we can't seem to find it amidst crunch time. Any help would be greatly appreciated.
If your data set is relatively small and not too volatile, you may want to use materialized views (assuming your database supports them) to optimize the lookup.
If materialized views are not an option, consider writing a stored procedure that retrieves that data in two passes:
First retrieve all of the CustomerVariables available for a particular customer (or set of customers)
Next, retrieve all of the default values from the Variables table
Perform a non-distinct union on the results merging the defaults in wherever a CustomerVariable record is missing.
Essentially, this is the equivalent of:
SELECT variableId,
CASE WHEN CV.variableId = NULL THEN VR.defaultValue ELSE CV.value END
FROM Variable VR
LEFT JOIN CUstomerVariable CV on CV.variableId = VR.variableId
WHERE CV.customerId = c.id
The type of query you want is called a pivot table or crosstab query.
By far the easiest way of dealing with this is to create a view based off of a crosstab query. This will flip the columns from being vertical to being horizontal like a regular sql table. Once that is done, just query the view. Easy. ;)