Check the query efficiency - sql

I have this below SQL query that I want to get an opinion on whether I can improve it using Temp Tables or something else or is this good enough? So basically I am just feeding the result set from inner query to the outer one.
SELECT S.SolutionID
,S.SolutionName
,S.Enabled
FROM dbo.Solution S
WHERE s.SolutionID IN (
SELECT DISTINCT sf.SolutionID
FROM dbo.SolutionToFeature sf
WHERE sf.SolutionToFeatureID IN (
SELECT sfg.SolutionToFeatureID
FROM dbo.SolutionFeatureToUsergroup SFG
WHERE sfg.UsergroupID IN (
SELECT UG.UsergroupID
FROM dbo.Usergroup UG
WHERE ug.SiteID = #SiteID
)
)
)

It's going to depend largely on the indexes you have on those tables. Since you are only selecting data out of the Solution table, you can put everything else in an exists clause, do some proper joins, and it should perform better.
The exists clause will allow you to remove the distinct you have on the SolutionToFeature table. Distinct will cause a performance hit because it is basically creating a temp table behind the scenes to do the comparison on whether or not the record is unique against the rest of the result set. You take a pretty big hit as your tables grow.
It will look something similar to what I have below, but without sample data or anything I can't tell if it's exactly right.
Select S.SolutionID, S.SolutionName, S.Enabled
From dbo.Solutin S
Where Exists (
select 1
from dbo.SolutionToFeature sf
Inner Join dbo.SolutionToFeatureTousergroup SFG on sf.SolutionToFeatureID = SFG.SolutionToFeatureID
Inner Join dbo.UserGroup UG on sfg.UserGroupID = UG.UserGroupID
Where S.SolutionID = sf.SolutionID
and UG.SiteID = #SiteID
)

Related

Is there a better way to optimize this SQL query?

Assume my table has millions of records.
The following columns have indexes:
type
date
unique_id
Here is my query:
SELECT TOP (1000) T.TIME, T.TYPE, F.NAME,
B.NAME, T.MESSAGE
FROM MY_TABLE T
LEFT OUTER JOIN FOO F ON F.ID = T.FID
LEFT OUTER JOIN BAR B ON B.ID = T.BID
WHERE T.TYPE IN ('success', 'failure')
AND T.DATE BETWEEN 1592585183437 AND 1594232320525
AND T.UNIQUE_ID = "my unique ID"
ORDER BY T.DATE DESC
My question is am I causing myself any trouble with this query if I have tons of records in my table? Can this be optimized further?
My question is am I causing myself any trouble with this query if I have tons of records
in my table?
Wrong question. As long as you are only asking for data you need, you NEED it. Any trouble means you STIL need it.
The query looks as good as it gets. I somehow doubt the TOP 10000 (that is a LOT of data unless you compress it somehow).
The question is more whether you have the proper indices. Without the proper indices. I would also not use strings like this for TYPE, but then the question is about the query, not possible failures in table design.
Check indices.

t-sql include inner join on a table only if records present

I have a very big table with tons and tons of records.
[HugeTable](id, col1, col2, col3...)
There is a page on the front end application showing this [HugeTable] data based on many filters. One of the filters will give a subset of [HugeTable], if not null
#HugeTable_subset(id)
if this filter is present, #HugeTable_subset would have records. I would like to narrow down [HugeTable] data to only matching records in #HugeTable_subset.
so right now, in the t-sql, I am doing an if-else kind of query
IF (SELECT Count(*) FROM #HugeTable_subset) > 0
BEGIN
SELECT HugeTable.* FROM [HugeTable] h
JOIN #HugeTable_subset t
ON h.id = t.id
WHERE h.params = #searchParams
END
ELSE
BEGIN
SELECT * FROM [HugeTable] h
WHERE h.params = #searchParams
END
Is there a way I could merge these two selects into one?
To join the two selects in one you can just use a LEFT OUTTER JOIN instead of a INNER JOIN.
You probably already know that, yes, maybe you don't knows you already doing it in the most optimized way. sql-server ill create two sub-query plans for each select inside the IF-ELSE and use each properly.
You can acid teste it to see if there are any difference and if the IF-ELSE really beats up the LEFT JOIN option
Also there's still two point I can point out.
1) Good Indexes over the filters can really improve your performance.
2) You can use pagination to return just a few results, improving performance and user experience when the result returns a ton of records
SELECT HugeTable.* FROM [HugeTable] h
WHERE ((SELECT Count(*) FROM #HugeTable_subset) = 0) OR
h.id IN (SELECT t.id from #HugeTable_subset t))

Alternative for joining two tables multiple times

I have a situation where I have to join a table multiple times. Most of them need to be left joins, since some of the values are not available. How to overcome the query poor performance when joining multiple times?
The Scenario
Tables
[Project]: ProjectId Guid, Name VARCHAR(MAX).
[UDF]: EntityId Guid, EntityType Char(1), UDFCode Guid, UDFName varchar(20)
[UDFDetail]: UDFCode Guid, Description VARCHAR(MAX)
Relationship:
[Project].ProjectId - [UDF].EntityId
[UDFDetail].UDFCode - [UDF].UDFCode
The UDF table holds custom fields for projects, based on the UDFName column. The value for these fields, however, is stored on the UDFDetail, in the column Description.
I have lots of custom columns for Project, and they are stored in the UDF table.
So for example, to get two fields for the project I do the following select:
SELECT
p.Name ProjectName,
ud1.Description Field1,
ud1.UDFCode Field1Id,
ud2.Description Field2,
ud2.UDFCode Field2Id
FROM
Project p
LEFT JOIN UDF u1 ON
u1.EntityId = p.ProjectId AND u1.ItemName='Field1'
LEFT JOIN UDFDetail ud1 ON
ud1.UDFCode = u1.UDFCode
LEFT JOIN UDF u2 ON
u2.EntityId = p.ProjectId AND u2.ItemName='Field2'
LEFT JOIN UDFDetail ud2 ON
ud2.UDFCode = u2.UDFCode
The Problem
Imagine the above select but joining with like 15 fields. In my query I have around 10 fields already and the performance is not very good. It is taking about 20 seconds to run. I have good indexes for these tables, so looking at the execution plan, it is doing only index seeks without any lookups. Regarding the joins, it needs to be left join, because Field 1 might not exist for that specific project.
The Question
Is there a more performatic way to retrieve the data?
How would you do the query to retrieve 10 different fields for one project in a schema like this?
Your choices are pivot, explicit aggregation (with conditional functions), or the joins. If you have the appropriate indexes set up, the joins may be the fastest method.
The correct index would be UDF(EntityId, ItemName, UdfCode).
You can test if the group by is faster by running a query such as:
SELECT count(*)
FROM p LEFT JOIN
UDF u1
ON u1.EntityId = p.ProjectId LEFT JOIN
UDFDetail ud1
ON ud1.UDFCode = u1.UDFCode;
If this runs fast enough, then you can consider the group by approach.
You can try this very weird contraption (it does not look pretty, but it does a single set of outer joins). The intermediate result is a very "wide" and "long" dataset, which we can then "compact" with aggregation (for example, for each ProjectName, each Field1 column will have N result, N-1 NULLs and 1 non-null result, which is then selecting with a simple MAX aggregation) [N is the number of fields].
select ProjectName, max(Field1) as Field1, max(Field1Id) as Field1Id, max(Field2) as Field2, max(Field2Id) as Field2Id
from (
select
p.Name as ProjectName,
case when u.UDFName='Field1' then ud.Description else NULL end as Field1,
case when u.UDFName='Field1' then ud.UDFCode else NULL end as Field1Id,
case when u.UDFName='Field2' then ud.Description else NULL end as Field2,
case when u.UDFName='Field2' then ud.UDFCode else NULL end as Field2Id
from Project p
left join UDF u on p.ProjectId=u.EntityId
left join UDFDetail ud on u.UDFCode=ud.UDFCode
) tmp
group by ProjectName
The query can actually be rewritten without the inner query, but that should not make a big difference :), and looking at Gordon Linoff's suggestion and your answer, it might actually take just about 20 seconds as well, but it is still worth giving a try.

Number of Records don't match when Joining three tables

Despite going through every material I could possibly find on the internet, I haven't been able to solve this issue myself. I am new to MS Access and would really appreciate any pointers.
Here's my problem - I have three tables
Source1084 with columns - Department, Sub-Dept, Entity, Account, +few more
R12CAOmappingTable with columns - Account, R12_Account
Table4 with columns - R12_Account, Department, Sub-Dept, Entity, New Dept, LOB +few more
I have a total of 1084 records in Source and the result table must also contain 1084 records. I need to draw a table with all the columns from Source + R12_account from R12CAOmappingTable + all columns from Table4.
Here is the query I wrote. This yields the right columns but gives me more or less number of records with interchanging different join options.
SELECT rmt.r12_account,
srb.version,
srb.fy,
srb.joblevel,
srb.scenario,
srb.department,
srb.[sub-department],
srb.[job function],
srb.entity,
srb.employee,
table4.lob,
table4.product,
table4.newacct,
table4.newdept,
srb.[beg balance],
srb.jan,
srb.feb,
srb.mar,
srb.apr,
srb.may,
srb.jun,
srb.jul,
srb.aug,
srb.sep,
srb.oct,
srb.nov,
srb.dec,
rmt.r12_account
FROM (source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account)
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity )
WHERE ( ( ( srb.[sub-department] ) = table4.subdept )
AND ( ( srb.entity ) = table4.entity )
AND ( ( rmt.r12_account ) = table4.r12_account ) );
In this simple example, Table1 contains 3 rows with unique fld1 values. Table2 contains one row, and the fld1 value in that row matches one of those in Table1. Therefore this query returns 3 rows.
SELECT *
FROM
Table1 AS t1
LEFT JOIN Table2 AS t2
ON t1.fld1 = t2.fld1;
However if I add the WHERE clause as below, that version of the query returns only one row --- the row where the fld1 values match.
SELECT *
FROM
Table1 AS t1
LEFT JOIN Table2 AS t2
ON t1.fld1 = t2.fld1
WHERE t1.fld1 = t2.fld1;
In other words, that WHERE clause counteracts the LEFT JOIN because it excludes rows where t2.fld1 is Null. If that makes sense, notice that second query is functionally equivalent to this ...
SELECT *
FROM
Table1 AS t1
INNER JOIN Table2 AS t2
ON t1.fld1 = t2.fld1;
Your situation is similar. I suggest you first eliminate the WHERE clause and confirm this query returns at least your expected 1084 rows.
SELECT Count(*) AS CountOfRows
FROM (source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account)
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity );
After you get the query returning the correct number of rows, you can alter the SELECT list to return the columns you want. But the columns aren't really the issue until you can get the correct rows.
Without knowing your tables values it is hard to give a complete answer to your question. The issue that is causing you a problem based on how you described it. Is more then likely based on the type of joins you are using.
The best way I found to understand what type of joins you should be using would referencing a Venn diagram explaining the different type of joins that you can use.
Jeff Atwood also has a really good explanation of SQL joins on his site using the above method as well.
Best to just use the query builder. Drop in your main table. Choose the columns you want. Now for any of the other lookup values then simply drop in the other tables, draw the join line(s), double click and use a left join. You can do this for 2 or 30 columns that need to "grab" or lookup other values from other tables. The number of ORIGINAL rows in the base table returned should ALWAYS remain the same.
So just use the query builder and follow the above.
The problem with your posted SQL is you NESTED the joins inside (). Don't do that. (or let the query builder do this for you – they tend to be quite messy but will also work).
Just use this:
FROM source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity )
As noted, I don't see why you are "repeating" the conditions again in the where clause.

I Need some sort of Conditional Join

Okay, I know there are a few posts that discuss this, but my problem cannot be solved by a conditional where statement on a join (the common solution).
I have three join statements, and depending on the query parameters, I may need to run any combination of the three. My Join statement is quite expensive, so I want to only do the join when the query needs it, and I'm not prepared to write a 7 combination IF..ELSE.. statement to fulfill those combinations.
Here is what I've used for solutions thus far, but all of these have been less than ideal:
LEFT JOIN joinedTable jt
ON jt.someCol = someCol
WHERE jt.someCol = conditions
OR #neededJoin is null
(This is just too expensive, because I'm performing the join even when I don't need it, just not evaluating the join)
OUTER APPLY
(SELECT TOP(1) * FROM joinedTable jt
WHERE jt.someCol = someCol
AND #neededjoin is null)
(this is even more expensive than always left joining)
SELECT #sql = #sql + ' INNER JOIN joinedTable jt ' +
' ON jt.someCol = someCol ' +
' WHERE (conditions...) '
(this one is IDEAL, and how it is written now, but I'm trying to convert it away from dynamic SQL).
Any thoughts or help would be great!
EDIT:
If I take the dynamic SQL approach, I'm trying to figure out what would be most efficient with regards to structuring my query. Given that I have three optional conditions, and I need the results from all of them my current query does something like this:
IF condition one
SELECT from db
INNER JOIN condition one
UNION
IF condition two
SELECT from db
INNER JOIN condition two
UNION
IF condition three
SELECT from db
INNER JOIN condition three
My non-dynamic query does this task by performing left joins:
SELECT from db
LEFT JOIN condition one
LEFT JOIN condition two
LEFT JOIN condition three
WHERE condition one is true
OR condition two is true
OR condition three is true
Which makes more sense to do? since all of the code from the "SELECT from db" statement is the same? It appears that the union condition is more efficient, but my query is VERY long because of it....
Thanks!
LEFT JOIN
joinedTable jt ON jt.someCol = someCol AND jt.someCol = conditions AND #neededjoin ...
...
OR
LEFT JOIN
(
SELECT col1, someCol, col2 FROM joinedTable WHERE someCol = conditions AND #neededjoin ...
) jt ON jt.someCol = someCol
...
OR
;WITH jtCTE AS
(SELECT col1, someCol, col2 FROM joinedTable WHERE someCol = conditions AND #neededjoin ...)
SELECT
...
LEFT JOIN
jtCTE ON jtCTE.someCol = someCol
...
To be honest, there is no such construct as a conditional JOIN unless you use literals.
If it's in the SQL statement it's evaluated... so don't have it in the SQL statement by using dynamic SQL or IF ELSE
the dynamic sql solution is usually the best for these situations, but if you really need to get away from that a series of if statments in a stroed porc will do the job. It's a pain and you have to write much more code but it will be faster than trying to make joins conditional in the statement itself.
I would go for a simple and straightforward approach like this:
DECLARE #ret TABLE(...) ;
IF <coondition one> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
IF <coondition two> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
IF <coondition three> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
SELECT DISTINCT ... FROM #ret ;
Edit: I am suggesting a table variable, not a temporary table, so that the procedure will not recompile every time it runs. Generally speaking, three simpler inserts have a better chance of getting better execution plans than one big huge monster query combining all three.
However, we can not guess-timate performance. we must benchmark to determine it. Yet simpler code chunks are better for readability and maintainability.
Try this:
LEFT JOIN joinedTable jt
ON jt.someCol = someCol
AND jt.someCol = conditions
AND #neededJoin = 1 -- or whatever indicates join is needed
I think you'll find it is good performance and does what you need.
Update
If this doesn't give the performance I claimed, then perhaps that's because the last time I did this using joins to a table. The value I needed could come from one of 3 tables, based on 2 columns, so I built a 'join-map' table like so:
Col1 Col2 TableCode
1 2 A
1 4 A
1 3 B
1 5 B
2 2 C
2 5 C
1 11 C
Then,
SELECT
V.*,
LookedUpValue =
CASE M.TableCode
WHEN 'A' THEN A.Value
WHEN 'B' THEN B.Value
WHEN 'C' THEN C.Value
END
FROM
ValueMaster V
INNER JOIN JoinMap M ON V.Col1 = M.oOl1 AND V.Col2 = M.Col2
LEFT JOIN TableA A ON M.TableCode = 'A'
LEFT JOIN TableB B ON M.TableCode = 'B'
LEFT JOIN TableC C ON M.TableCode = 'C'
This gave me a huge performance improvement querying these tables (most of them dozens or hundreds of million-row tables).
This is why I'm asking if you actually get improved performance. Of course it's going to throw a join into the execution plan and assign it some cost, but overall it's going to do a lot less work than some plan that just indiscriminately joins all 3 tables and then Coalesce()s to find the right value.
If you find that compared to dynamic SQL it's only 5% more expensive to do the joins this way, but with the indiscriminate joins is 100% more expensive, it might be worth it to you to do this because of the correctness, clarity, and simplicity over dynamic SQL, all of which are probably more valuable than a small improvement (depending on what you're doing, of course).
Whether the cost scales with the number of rows is also another factor to consider. If even with a huge amount of data you only save 200ms of CPU on a query that isn't run dozens of times a second, it's a no-brainer to use it.
The reason I keep hammering on the fact that I think it's going to perform well is that even with a hash match, it wouldn't have any rows to probe with, or it wouldn't have any rows to create a hash of. The hash operation is going to stop a lot earlier compared to using the WHERE clause OR-style query of your initial post.
The dynamic SQL solution is best in most respects; you are trying to run different queries with different numbers of joins without rewriting the query to do different numbers of joins - and that doesn't work very well in terms of performance.
When I was doing this sort of stuff an æon or so ago (say the early 90s), the language I used was I4GL and the queries were built using its CONSTRUCT statement. This was used to generate part of a WHERE clause, so (based on the user input), the filter criteria it generated might look like:
a.column1 BETWEEN 1 AND 50 AND
b.column2 = 'ABCD' AND
c.column3 > 10
In those days, we didn't have the modern JOIN notations; I'm going to have to improvise a bit as we go. Typically there is a core table (or a set of core tables) that are always part of the query; there are also some tables that are optionally part of the query. In the example above, I assume that 'c' is the alias for the main table. The way the code worked would be:
Note that table 'a' was referenced in the query:
Add 'FullTableName AS a' to the FROM clause
Add a join condition 'AND a.join1 = c.join1' to the WHERE clause
Note that table 'b' was referenced...
Add bits to the FROM clause and WHERE clause.
Assemble the SELECT statement from the select-list (usually fixed), the FROM clause and the WHERE clause (occasionally with decorations such as GROUP BY, HAVING or ORDER BY too).
The same basic technique should be applied here - but the details are slightly different.
First of all, you don't have the string to analyze; you know from other circumstances which tables you need to add to your query. So, you still need to design things so that they can be assembled, but...
The SELECT clause with its select-list is probably fixed. It will identify the tables that must be present in the query because values are pulled from those tables.
The FROM clause will probably consist of a series of joins.
One part will be the core query:
FROM CoreTable1 AS C1
JOIN CoreTable2 AS C2
ON C1.JoinColumn = C2.JoinColumn
JOIN CoreTable3 AS M
ON M.PrimaryKey = C1.ForeignKey
Other tables can be added as necessary:
JOIN AuxilliaryTable1 AS A
ON M.ForeignKey1 = A.PrimaryKey
Or you can specify a full query:
JOIN (SELECT RelevantColumn1, RelevantColumn2
FROM AuxilliaryTable1
WHERE Column1 BETWEEN 1 AND 50) AS A
In the first case, you have to remember to add the WHERE criterion to the main WHERE clause, and trust the DBMS Optimizer to move the condition into the JOIN table as shown. A good optimizer will do that automatically; a poor one might not. Use query plans to help you determine how able your DBMS is.
Add the WHERE clause for any inter-table criteria not covered in the joining operations, and any filter criteria based on the core tables. Note that I'm thinking primarily in terms of extra criteria (AND operations) rather than alternative criteria (OR operations), but you can deal with OR too as long as you are careful to parenthesize the expressions sufficiently.
Occasionally, you may have to add a couple of JOIN conditions to connect a table to the core of the query - that is not dreadfully unusual.
Add any GROUP BY, HAVING or ORDER BY clauses (or limits, or any other decorations).
Note that you need a good understanding of the database schema and the join conditions. Basically, this is coding in your programming language the way you have to think about constructing the query. As long as you understand this and your schema, there aren't any insuperable problems.
Good luck...
Just because no one else mentioned this, here's something that you could use (not dynamic). If the syntax looks weird, it's because I tested it in Oracle.
Basically, you turn your joined tables into sub-selects that have a where clause that returns nothing if your condition does not match. If the condition does match, then the sub-select returns data for that table. The Case statement lets you pick which column is returned in the overall select.
with m as (select 1 Num, 'One' Txt from dual union select 2, 'Two' from dual union select 3, 'Three' from dual),
t1 as (select 1 Num from dual union select 11 from dual),
t2 as (select 2 Num from dual union select 22 from dual),
t3 as (select 3 Num from dual union select 33 from dual)
SELECT m.*
,CASE 1
WHEN 1 THEN
t1.Num
WHEN 2 THEN
t2.Num
WHEN 3 THEN
t3.Num
END SelectedNum
FROM m
LEFT JOIN (SELECT * FROM t1 WHERE 1 = 1) t1 ON m.Num = t1.Num
LEFT JOIN (SELECT * FROM t2 WHERE 1 = 2) t2 ON m.Num = t2.Num
LEFT JOIN (SELECT * FROM t3 WHERE 1 = 3) t3 ON m.Num = t3.Num