Is there a way to increase the fetching performance of my LINQ to SQL query where its total count is more than a hundred thousand? Should I separate the total data to 5 parts by using skip or take? My query, when fetching, took more than 40 minutes.
Dim query = From a In context.Orders
Join b In context.Status On a.OrderItem Equals b.OrderItem
Join c In context.Summary On a.OrderItem.Substring(0, 16) Equals c.OrderSuffix
Where Not (b.Status.Contains("E")) And a.Type = "AG" And a.OrderItem = b.OrderItem And a.OrderItem.Substring(0, 16) = c.OrderSuffix
Order By a.OrderItem.Substring(0, 16), a.Agn Ascending
Select a.OrderItem,
JobOr = a.auftrag_nr.Substring(0, 16),
Suffix = (a.auftrag_nr.Substring(0, 16).Substring(13)),
a.Agn,
a.Base,
b.Status,
a.Group,
a.Machine,
a.Article,
a.Height,
a.Sol, c.plan, a.DatePlanned
Actually you make a really big join, which you filter afterwards.
If you have a filter like a.type = "AG" than you should apply it before you join, so many unnecessary join results will not be even generated.
Another idea for you:
The other style of writing linq queries.
like:
dim foo as IQueryable(Of TypeInYourTable) = yourDBContext.yourTable.where(function(v) v.a = ...).SomeOtherLinqFunctions...
In this way you can build up the queriables for each table and filter them first (e.g. the a.type = "AG"), then you can use the .Join(...) function and make proper joins, where you not just compare one pair of values of two datasets.
Be aware, that as long as the Queryable is a Queryable, no call was made to the database. You need to call e.g. .ToArray() or .ToList() or ... to enforce the call.
If you don't and you return a Queryable, then close the context of it and then try to get the results of the Queryable you will get a runtime exception.
Related
I have this SQL query which is really complex to me and I tried to convert it to EntityFramework Core code based, but I couldn't do at least the multiple join.
SELECT vrCore_Product.iMasterId, vrCore_Product.sName[Particular],
SUM(tCore_Indta_0.fQuantityInBase) - ISNULL(AVG(tCore_ReservedStock_0.fQuantity),0)[Net Quantity],vrPos_Outlet.iMasterId[Product]
,vrCore_Product.sCode[vrCore_Product.sCode0] FROM tCore_Data_0
JOIN tCore_Header_0 ON tCore_Header_0.iHeaderId = tCore_Data_0.iHeaderId
JOIN tCore_Indta_0 ON tCore_Data_0.iBodyId = tCore_Indta_0.iBodyId
JOIN cCore_Vouchers_0 WITH (READUNCOMMITTED) ON tCore_Header_0.iVoucherType = cCore_Vouchers_0.iVoucherType
JOIN vrCore_Product ON vrCore_Product.iMasterId = tCore_Indta_0.iProduct AND vrCore_Product.iTreeId = 0
JOIN vrPos_Outlet ON vrPos_Outlet.iMasterId = tCore_Data_0.iInvTag
LEFT JOIN
(
Select iProduct, tCore_Data_0.iInvTag, SUM(case bReserveOrRelease when 0 then tCore_ReservedStock_0.fQuantity else -tCore_ReservedStock_0.fQuantity end) fQuantity
FROM tCore_ReservedStock_0
JOIN tCore_Data_0 ON tCore_Data_0.iTransactionId = tCore_ReservedStock_0.iTransactionId
JOIN tCore_Indta_0 ON tCore_Indta_0.iBodyId = tCore_Data_0.iBodyId
JOIN tCore_Header_0 ON tCore_Header_0.iHeaderId = tCore_Data_0.iHeaderId
WHERE tCore_Header_0.bSuspended = 0
GROUP BY iProduct,tCore_Data_0.iInvTag
HAVING SUM(CASE bReserveOrRelease WHEN 0 THEN tCore_ReservedStock_0.fQuantity ELSE -tCore_ReservedStock_0.fQuantity END)<>0
)tCore_ReservedStock_0 ON tCore_ReservedStock_0.iProduct = tCore_Indta_0.iProduct AND tCore_ReservedStock_0.iInvTag = tCore_Data_0.iInvTag WHERE tCore_Header_0.bUpdateStocks = 1 AND tCore_Data_0.bSuspendUpdateStocks <> 1
AND tCore_Header_0.bSuspended = 0 AND tCore_Data_0.iAuthStatus < 2
AND (tCore_Header_0.iDate BETWEEN dbo.DateToInt('2020-01-10') AND dbo.DateToInt('2020-01-22') OR (tCore_Header_0.iDate < dbo.DateToInt('2020-01-22') AND tCore_Header_0.iVoucherClass = 512)) AND vrCore_Product.iProductType <> 'Service' AND vrPos_Outlet.iMasterId IN (26) GROUP BY vrPos_Outlet.iMasterId, vrCore_Product.iMasterId, vrCore_Product.sName ,vrCore_Product.sCode HAVING SUM(tCore_Indta_0.fQuantity) <> 0 ORDER BY vrPos_Outlet.iMasterId
We think in objects when we code in C#. We think relational when we do T-SQL. This creates a problem called object-relational impedance mismatch. EF helps break that problem by allowing us to think only in objects. You are trying to do the exact opposite, which is translate a query back to an object representation, via a Linq query. Here's a tip: I never do that. My starting point is the C# object model and I model the query thinking in Linq. I don't care about T-SQL.
This might seem like just a matter of syntax, since both T-SQL and Linq are query languages, and in the end, Linq gets translated to T-SQL by EF. However, the difference between them is not only syntax, but the way we think. For instance, while we think about joins in T-SQL, in linq, we think about navigation properties. On one, we think about relations and foreign keys, on the other, we think about objects and graphs.
In the very rare case where I'm better off with a T-SQL query, I rather execute a raw query instead of going through the pain of translating it back to Linq: https://learn.microsoft.com/en-us/ef/core/querying/raw-sql
In your case I would do either of the following 2 things:
I would either simply throw away the T-SQL query and start over, thinking in Linq. It helps if I have a good understanding of the business domain and what I need to extract using that query. This is most of the time, favourite approach.
Or, in case I'm too committed to the query (pun not intended), I would simply execute it as a raw query using EF (always using parameters instead of concatenating the query with input values, in order to prevent any Sql Injection vulnerability). The only downside to this that your code stops being database engine independent the moment you add a raw query to it.
If you have relationships, you just can use Include() like
var data = _context.TcoreData0
.Include(x => x.TcoreHeader0)
.Include(x => x.TcoreIndta0)
.Include("tCore_Header_0.cCore_Vouchers_0")
.Include...
When you use the Ibis API to query impala, for some reason Ibis API forces it to become a subquery (when you join 4-5 tables it suddenly becomes super slow). It simply won't join normally, due to column name overlap problem on joins. I want a way to quickly rename the columns perhaps, isn't that's how SQL usually works?
i0 = impCon.table('shop_inventory')
s0 = impCon.table('shop_expenditure')
s0 = s0.relabel({'element_date': 'spend_element_date', 'element_shop_item': 'spend_shop_item'})
jn = i0.inner_join(s0, [i0['element_date'] == s0['spend_element_date'], i0['element_shop_item'] == s0['spend_shop_item']])
jn.materialize()
jn.execute(limit=900)
Then you have IBIS generating SQL that is SUBQUERYING it without me suggesting it:
SELECT *
FROM (
SELECT `element_date`, `element_shop_item`, `element_address`, `element_expiration`,
`element_category`, `element_description`
FROM dbp.`shop_inventory`
) t0
INNER JOIN (
SELECT `element_shop_item` AS `spend_shop_item`, `element_comm` AS `spend_comm`,
`element_date` AS `spend_date`, `element_amount`,
`element_spend_type`, `element_shop_item_desc`
FROM dbp.`shop_spend`
) t1
ON (`element_shop_item` = t1.`spend_shop_item`) AND
(`element_category` = t1.`spend_category`) AND
(`element_subcategory` = t1.`spend_subcategory`) AND
(`element_comm` = t1.`spend_comm`) AND
(`element_date` = t1.`spend_date`)
LIMIT 900
Why is this so difficult?
It should be ideally as simple as:
jn = i0.inner_join(s0, [s0['element_date'].as('spend_date') == i0['element_date']]
to generate a single: SELECT s0.element_date as spend_date, i0.element_date INNER JOIN s0 dbp.shop_spend ON s0.spend_date == i0.element_date
right?
Are we not ever allowed to have same column names on tables that are being joined? I am pretty sure in raw SQL you can just use "X AS Y" without having to need subquery.
I spent the last few hours struggling with this same issue. A better solution I found is to do the following. Join keeping the variable names the same. Then, before you materialize, only select a subset of the variables such that there isn't any overlap.
So in your code it would look something like this:
jn = i0.inner_join(s0, [i0['element_date'] == s0['element_date'], i0['element_shop_item'] == s0['element_shop_item']])
expr = jn[i0, s0['variable_of_interest_1'],s0['variable_of_interest_2']]
expr.materialize()
See here for more resources
https://docs.ibis-project.org/sql.html
I am dealing with a monster query ( ~800 lines ) on oracle 11, and its taking expensive resources.
The main problem here is a table mouvement with about ~18 million lines, on which I have like 30 left joins on this table.
LEFT JOIN mouvement mracct_ad1
ON mracct_ad1.code_portefeuille = t.code_portefeuille
AND mracct_ad1.statut_ligne = 'PROPRE'
AND substr(mracct_ad1.code_valeur,1,4) = 'MRAC'
AND mracct_ad1.code_transaction = t.code_transaction
LEFT JOIN mouvement mracct_zias
ON mracct_zias.code_portefeuille = t.code_portefeuille
AND mracct_zias.statut_ligne = 'PROPRE'
AND substr(mracct_zias.code_valeur,1,4) = 'PRAC'
AND mracct_zias.code_transaction = t.code_transaction
LEFT JOIN mouvement mracct_zixs
ON mracct_zias.code_portefeuille = t.code_portefeuille
AND mracct_zias.statut_ligne = 'XROPRE'
AND substr(mracct_zias.code_valeur,1,4) = 'MRAT'
AND mracct_zias.code_transaction = t.code_transaction
is there some way so I can get rid of the left joins, (union join or example) to make the query faster and consumes less? execution plan or something?
Just a note on performance. Usually you want to "rephrase" conditions like:
AND substr(mracct_ad1.code_valeur,1,4) = 'MRAC'
In simple words, expressions on the left side of the equality will prevent the best usage of indexes and may push the SQL optimizer toward a less than optimal plan. The database engine will end up doing more work than is really needed, and the query will be [much] slower. In extreme cases they can even decide to use a Full Table Scan. In this case you can rephrase it as:
AND mracct_ad1.code_valeur like 'MRAC%'
or:
AND mracct_ad1.code_valeur >= 'MRAC' AND mracct_ad1.code_valeur < 'MRAD'
I am guessing so. Your code sample doesn't make much sense, but you can probably do conditional aggregation:
left join
(select m.code_portefeuille, m.code_transaction,
max(case when m.statut_ligne = 'PROPRE' and m.code_valeur like 'MRAC%' then ? end) as ad1,
max(case when m.statut_ligne = 'PROPRE' and m.code_valeur like 'MRAC%' then ? end) as zia,
. . . -- for all the rest of the joins as well
from mouvement m
group by m.code_portefeuille, m.code_transaction
) m
on m.code_portefeuille = t.code_portefeuille and m.code_transaction = t.code_transaction
You can probably replace all 30 joins with a single join to the aggregated table.
I have a SQL Query that comprise of two level sub-select. This is taking too much time.
The Query goes like:
select * from DALDBO.V_COUNTRY_DERIV_SUMMARY_XREF
where calculation_context_key = 130205268077
and DERIV_POSITION_KEY in
(select ctry_risk_derivs_psn_key
from DALDBO.V_COUNTRY_DERIV_PSN
where calculation_context_key = 130111216755
--and ctry_risk_derivs_psn_key = 76296412
and CREDIT_PRODUCT_TYPE = 'SWP OP'
and CALC_OBLIGOR_COUNTRY_OF_ASSETS in
(select ctry_cd
from DALDBO.V_PSN_COUNTRY
where calculation_context_key = 130134216755
--and ctry_risk_derivs_psn_key = 76296412
)
)
These tables are huge! Is there any optimizations available?
Without knowing anything about your table or view definitions, indexing, etc. I would start by looking at the sub-selects and ensuring that they are performing optimally. I would also want to know how many values are being returned by each sub-select as this can impact performance.
How is calculation_context_key used to retrieve rows from V_COUNTRY_DERIV_PSN and V_PSN_COUNTRY? Is it an optimal execution plan?
How is DERIV_POSITION_KEY and CALC_OBLIGOR_COUNTRY_OF_ASSETS used in V_COUNTRY_DERIV_SUMMARY_XREF to retrieve rows? Again, look at the explain plan.
first of all, can you write this query using inner joins (and not subselect) ??
select A.*
from DALDBO.V_COUNTRY_DERIV_SUMMARY_XREF a,
DALDBO.V_COUNTRY_DERIV_PSN b,
DALDBO.V_PSN_COUNTRY c
where calculation_context_key = 130205268077
and a.DERIV_POSITION_KEY = b.ctry_risk_derivs_psn_key
and b.calculation_context_key = 130111216755
--and b.ctry_risk_derivs_psn_key = 76296412
and b.CREDIT_PRODUCT_TYPE = 'SWP OP'
and b.CALC_OBLIGOR_COUNTRY_OF_ASSETS = c.ctry_cd
and c.calculation_context_key = 130134216755
--and c.ctry_risk_derivs_psn_key = 76296412
second, best practice says that when you don't query any data from the tables in the subselect you better of using an EXISTS instead of IN. new versions of oracle does that automatically and actually rewrite the whole thing as an inner join.
last, without any knowledge on you data and of what you are trying to do i would suggest you to try and use views as less as you can - if you can query the underling tables it would be best and you will probably see immediate performance improvement.
I have a query joining 4 tables with a lot of conditions in the WHERE clause. The query also includes ORDER BY clause on a numeric column. It takes 6 seconds to return which is too long and I need to speed it up. Surprisingly I found that if I remove the ORDER BY clause it takes 2 seconds. Why the order by makes so massive difference and how to optimize it? I am using SQL server 2005. Many thanks.
I cannot confirm that the ORDER BY makes big difference since I am clearing the execution plan cache. However can you shed light at how to speed this up a little bit? The query is as follows (for simplicity there is "SELECT *" but I am only selecting the ones I need).
SELECT *
FROM View_Product_Joined j
INNER JOIN [dbo].[OPR_PriceLookup] pl on pl.siteID = NodeSiteID and pl.skuid = j.skuid
LEFT JOIN [dbo].[OPR_InventoryRules] irp on irp.ID = pl.SkuID and irp.InventoryRulesType = 'Product'
LEFT JOIN [dbo].[OPR_InventoryRules] irs on irs.ID = pl.siteID and irs.InventoryRulesType = 'Store'
WHERE (((((SiteName = N'EcommerceSite') AND (Published = 1)) AND (DocumentCulture = N'en-GB')) AND (NodeAliasPath LIKE N'/Products/Cats/Computers/Computer-servers/%')) AND ((NodeSKUID IS NOT NULL) AND (SKUEnabled = 1) AND pl.PriceLookupID in (select TOP 1 PriceLookupID from OPR_PriceLookup pl2 where pl.skuid = pl2.skuid and (pl2.RoleID = -1 or pl2.RoleId = 13) order by pl2.RoleID desc)))
ORDER BY NodeOrder ASC
Why the order by makes so massive difference and how to optimize it?
The ORDER BY needs to sort the resultset which may take long if it's big.
To optimize it, you may need to index the tables properly.
The index access path, however, has its drawbacks so it can even take longer.
If you have something other than equijoins in your query, or the ranged predicates (like <, > or BETWEEN, or GROUP BY clause), then the index used for ORDER BY may prevent the other indexes from being used.
If you post the query, I'll probably be able to tell you how to optimize it.
Update:
Rewrite the query:
SELECT *
FROM View_Product_Joined j
LEFT JOIN
[dbo].[OPR_InventoryRules] irp
ON irp.ID = j.skuid
AND irp.InventoryRulesType = 'Product'
LEFT JOIN
[dbo].[OPR_InventoryRules] irs
ON irs.ID = j.NodeSiteID
AND irs.InventoryRulesType = 'Store'
CROSS APPLY
(
SELECT TOP 1 *
FROM OPR_PriceLookup pl
WHERE pl.siteID = j.NodeSiteID
AND pl.skuid = j.skuid
AND pl.RoleID IN (-1, 13)
ORDER BY
pl.RoleID desc
) pl
WHERE SiteName = N'EcommerceSite'
AND Published = 1
AND DocumentCulture = N'en-GB'
AND NodeAliasPath LIKE N'/Products/Cats/Computers/Computer-servers/%'
AND NodeSKUID IS NOT NULL
AND SKUEnabled = 1
ORDER BY
NodeOrder ASC
The relation View_Product_Joined, as the name suggests, is probably a view.
Could you please post its definition?
If it is indexable, you may benefit from creating an index on View_Product_Joined (SiteName, Published, DocumentCulture, SKUEnabled, NodeOrder).